Skip to content

Commit

Permalink
BOLT01: add wire TLV proposal
Browse files Browse the repository at this point in the history
  • Loading branch information
cfromknecht committed May 24, 2019
1 parent d42b4e2 commit 2c69fc2
Show file tree
Hide file tree
Showing 2 changed files with 122 additions and 0 deletions.
12 changes: 12 additions & 0 deletions .aspell.en.pws
Expand Up @@ -330,3 +330,15 @@ zlib
ZLIB
APIs
duplicative
TLV
namespace
namespaces
verifier
verifiers
EOF
monotonicity
varint
varbytes
optimizations
structs
xff
110 changes: 110 additions & 0 deletions 01-messaging.md
Expand Up @@ -13,6 +13,7 @@ All data fields are unsigned big-endian unless otherwise specified.

* [Connection Handling and Multiplexing](#connection-handling-and-multiplexing)
* [Lightning Message Format](#lightning-message-format)
* [Wire Type-Length-Value Format](#wire-type-length-value-format)
* [Setup Messages](#setup-messages)
* [The `init` Message](#the-init-message)
* [The `error` Message](#the-error-message)
Expand Down Expand Up @@ -82,6 +83,115 @@ however, adding a 6-byte padding after the type field was considered
wasteful: alignment may be achieved by decrypting the message into
a buffer with 6-bytes of pre-padding.


## Wire Type-Length-Value Format

Throughout the protocol, we use a TLV (Type-Length-Value) format to allow for
the backwards-compatible addition of new fields to existing message types. This
format describes the TLV encoding specific to wire messages.

A `tlv_record` represents a single field, encoded in the form:

* [`1`: `type`]
* [`varint`: `length`]
* [`length`: `value`]

A `tlv_stream` is a series of (possibly zero) `tlv_record`s, represented as the
concatenation of the encoded `tlv_record`s. When used to extend existing
messages, a `tlv_stream` is typically placed after all currently defined fields.

The `type` is a message-specific, 8-bit identifier for the `tlv_record`
determining how the contents of `value` should be decoded. The `type` 0x00 is
reserved for use as a sentinel value; a sentinel record has neither a `length`
nor `value`.

The `length` is a varint encoded using the format defined below to signal the
size of `value` in bytes.

The `value` depends entirely on the `type`. Each `tlv_stream` has its own 8-bit
namespace for defining the meaning of its `type` identifiers.

A variable-length integer `v` in the range `[0, 2^16)` should be encoded in the
following format:
- if `v` < 0xff:
- write `uint8(v)`
- otherwise:
- write `(0xff, uint8(v), uint8(v >> 8)`.

Decoders should fail if the value is not minimally encoded.

### Requirements

The sending node:
- MUST order `tlv_record`s in a `tlv_stream` by monotonically-increasing `type`.
- MUST NOT encode a `length` or `value` for `type` 0x00.
- MAY use `type` 0x00 to signal the end of a `tlv_stream`.
- SHOULD NOT use the varbytes encoding for `value`s containing byte arrays.

The receiving node:
- if encounters EOF while parsing a `type`:
- MUST stop parsing the `tlv_stream`.
- if `type` is 0x00:
- MUST stop parsing the `tlv_stream`.
- if encounters EOF while parsing a `length` or `value`:
- MUST fail to parse to `tlv_stream`.
- if `type` and `length` parse:
- if `type` is known:
- MUST decode the next `length` bytes using the known encoding for `type`.
- otherwise, if `type` is unknown:
- MUST discard the next `length` bytes.
- if non-sentinel `type`s are not monotonically-increasing:
- MUST fail to parse the `tlv_stream`.

### Rationale

The primary advantage in using TLV is that a reader is able to ignore new fields
that it does not understand, since each field carries the exact size of the
encoded element. Without TLV, even if a node does not wish to use a particular
field, the node is forced to add parsing logic for that field in order to
determine the offset of any fields that follow.

The monotonicity constraint ensures that all `type`s are unique and can appear
at most once. Fields that map to complex objects, e.g. vectors, maps, or
structs, should do so by defining the encoding such that the object is
serialized within a single `tlv_record`. The uniqueness constraint, among other
things, enables the following optimizations:
- canonical ordering is defined independent of the encoded `value`s.
- canonical ordering can be known at compile-time, rather that being determined
dynamically at the time of encoding.
- verifying canonical ordering requires less state and is less-expensive.
- variable-size fields can reserve their expected size up front, rather than
appending elements sequentially and incurring double-and-copy overhead.

The sentinel `type` allows the encoder to signal the end of a `tlv_stream` when
the length is not clear from the surrounding context. Its value is chosen as
0x00 so that the sentinel value is implicit if there exist zero-bytes past the
end of the encoded `tlv_stream`, which happens when constructing multi-frame
payloads and the remaining bytes are zero-padded to a multiple of the frame
size. It also permits the concatenation of `tlv_stream`s with distinct
namespaces, providing an alternative to nested `tlv_stream`s, or allowing other
data to be written after a `tlv_stream`. Additionally, a sentinel `type` allows
a `tlv_stream` to be written directly without needing to compute its total
serialized length.

The use of a varint for `length` permits a space savings over a fixed 16-bit
`length` for `value`s whose encoded length is less than 254 bytes. When a
`tlv_stream` is used in multi-frame payloads, this can potentially save payloads
from spilling over into extra frames and increase the possible route length for
certain applications.

All non-sentinel `type` bytes must appear in increasing order to create a
canonical encoding of the underlying `tlv_record`s. This is crucial when
computing signatures over a `tlv_stream`, as it ensures verifiers will be able
to recompute the same message digest as the signer. Note that the canonical
ordering over the set of fields can be enforced even if the verifier does not
understand what the fields contain.

We recommend that writers not use the varbytes encoding for byte arrays since it
encodes the length twice, and also makes determining the outer `length` more
difficult. Failure to account for the length of the varint in the outer `length`
will result in corrupted values for the receiver.

## Setup Messages

### The `init` Message
Expand Down

0 comments on commit 2c69fc2

Please sign in to comment.