From 2c69fc2a6ebbb7a6bc863181fb97502a1031efd4 Mon Sep 17 00:00:00 2001 From: Conner Fromknecht Date: Mon, 29 Apr 2019 19:35:56 -0700 Subject: [PATCH] BOLT01: add wire TLV proposal --- .aspell.en.pws | 12 ++++++ 01-messaging.md | 110 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 122 insertions(+) diff --git a/.aspell.en.pws b/.aspell.en.pws index b1a1ad307..902854fc9 100644 --- a/.aspell.en.pws +++ b/.aspell.en.pws @@ -330,3 +330,15 @@ zlib ZLIB APIs duplicative +TLV +namespace +namespaces +verifier +verifiers +EOF +monotonicity +varint +varbytes +optimizations +structs +xff diff --git a/01-messaging.md b/01-messaging.md index 9ff3d55fd..ba3730ca7 100644 --- a/01-messaging.md +++ b/01-messaging.md @@ -13,6 +13,7 @@ All data fields are unsigned big-endian unless otherwise specified. * [Connection Handling and Multiplexing](#connection-handling-and-multiplexing) * [Lightning Message Format](#lightning-message-format) + * [Wire Type-Length-Value Format](#wire-type-length-value-format) * [Setup Messages](#setup-messages) * [The `init` Message](#the-init-message) * [The `error` Message](#the-error-message) @@ -82,6 +83,115 @@ however, adding a 6-byte padding after the type field was considered wasteful: alignment may be achieved by decrypting the message into a buffer with 6-bytes of pre-padding. + +## Wire Type-Length-Value Format + +Throughout the protocol, we use a TLV (Type-Length-Value) format to allow for +the backwards-compatible addition of new fields to existing message types. This +format describes the TLV encoding specific to wire messages. + +A `tlv_record` represents a single field, encoded in the form: + +* [`1`: `type`] +* [`varint`: `length`] +* [`length`: `value`] + +A `tlv_stream` is a series of (possibly zero) `tlv_record`s, represented as the +concatenation of the encoded `tlv_record`s. When used to extend existing +messages, a `tlv_stream` is typically placed after all currently defined fields. + +The `type` is a message-specific, 8-bit identifier for the `tlv_record` +determining how the contents of `value` should be decoded. The `type` 0x00 is +reserved for use as a sentinel value; a sentinel record has neither a `length` +nor `value`. + +The `length` is a varint encoded using the format defined below to signal the +size of `value` in bytes. + +The `value` depends entirely on the `type`. Each `tlv_stream` has its own 8-bit +namespace for defining the meaning of its `type` identifiers. + +A variable-length integer `v` in the range `[0, 2^16)` should be encoded in the +following format: + - if `v` < 0xff: + - write `uint8(v)` + - otherwise: + - write `(0xff, uint8(v), uint8(v >> 8)`. + + Decoders should fail if the value is not minimally encoded. + +### Requirements + +The sending node: + - MUST order `tlv_record`s in a `tlv_stream` by monotonically-increasing `type`. + - MUST NOT encode a `length` or `value` for `type` 0x00. + - MAY use `type` 0x00 to signal the end of a `tlv_stream`. + - SHOULD NOT use the varbytes encoding for `value`s containing byte arrays. + +The receiving node: + - if encounters EOF while parsing a `type`: + - MUST stop parsing the `tlv_stream`. + - if `type` is 0x00: + - MUST stop parsing the `tlv_stream`. + - if encounters EOF while parsing a `length` or `value`: + - MUST fail to parse to `tlv_stream`. + - if `type` and `length` parse: + - if `type` is known: + - MUST decode the next `length` bytes using the known encoding for `type`. + - otherwise, if `type` is unknown: + - MUST discard the next `length` bytes. + - if non-sentinel `type`s are not monotonically-increasing: + - MUST fail to parse the `tlv_stream`. + +### Rationale + +The primary advantage in using TLV is that a reader is able to ignore new fields +that it does not understand, since each field carries the exact size of the +encoded element. Without TLV, even if a node does not wish to use a particular +field, the node is forced to add parsing logic for that field in order to +determine the offset of any fields that follow. + +The monotonicity constraint ensures that all `type`s are unique and can appear +at most once. Fields that map to complex objects, e.g. vectors, maps, or +structs, should do so by defining the encoding such that the object is +serialized within a single `tlv_record`. The uniqueness constraint, among other +things, enables the following optimizations: + - canonical ordering is defined independent of the encoded `value`s. + - canonical ordering can be known at compile-time, rather that being determined + dynamically at the time of encoding. + - verifying canonical ordering requires less state and is less-expensive. + - variable-size fields can reserve their expected size up front, rather than + appending elements sequentially and incurring double-and-copy overhead. + +The sentinel `type` allows the encoder to signal the end of a `tlv_stream` when +the length is not clear from the surrounding context. Its value is chosen as +0x00 so that the sentinel value is implicit if there exist zero-bytes past the +end of the encoded `tlv_stream`, which happens when constructing multi-frame +payloads and the remaining bytes are zero-padded to a multiple of the frame +size. It also permits the concatenation of `tlv_stream`s with distinct +namespaces, providing an alternative to nested `tlv_stream`s, or allowing other +data to be written after a `tlv_stream`. Additionally, a sentinel `type` allows +a `tlv_stream` to be written directly without needing to compute its total +serialized length. + +The use of a varint for `length` permits a space savings over a fixed 16-bit +`length` for `value`s whose encoded length is less than 254 bytes. When a +`tlv_stream` is used in multi-frame payloads, this can potentially save payloads +from spilling over into extra frames and increase the possible route length for +certain applications. + +All non-sentinel `type` bytes must appear in increasing order to create a +canonical encoding of the underlying `tlv_record`s. This is crucial when +computing signatures over a `tlv_stream`, as it ensures verifiers will be able +to recompute the same message digest as the signer. Note that the canonical +ordering over the set of fields can be enforced even if the verifier does not +understand what the fields contain. + +We recommend that writers not use the varbytes encoding for byte arrays since it +encodes the length twice, and also makes determining the outer `length` more +difficult. Failure to account for the length of the varint in the outer `length` +will result in corrupted values for the receiver. + ## Setup Messages ### The `init` Message