Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOLT01: TLV proposal #607

Merged
merged 1 commit into from Jul 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 11 additions & 0 deletions .aspell.en.pws
Expand Up @@ -330,3 +330,14 @@ zlib
ZLIB
APIs
duplicative
TLV
namespace
verifier
verifiers
EOF
monotonicity
varint
optimizations
structs
CompactSize
encodings
89 changes: 89 additions & 0 deletions 01-messaging.md
Expand Up @@ -13,6 +13,7 @@ All data fields are unsigned big-endian unless otherwise specified.

* [Connection Handling and Multiplexing](#connection-handling-and-multiplexing)
* [Lightning Message Format](#lightning-message-format)
* [Type-Length-Value Format](#type-length-value-format)
* [Setup Messages](#setup-messages)
* [The `init` Message](#the-init-message)
* [The `error` Message](#the-error-message)
Expand Down Expand Up @@ -82,6 +83,94 @@ however, adding a 6-byte padding after the type field was considered
wasteful: alignment may be achieved by decrypting the message into
a buffer with 6-bytes of pre-padding.


## Type-Length-Value Format

Throughout the protocol, a TLV (Type-Length-Value) format is used to allow for
the backwards-compatible addition of new fields to existing message types.

A `tlv_record` represents a single field, encoded in the form:

* [`varint`: `type`]
* [`varint`: `length`]
* [`length`: `value`]

A `tlv_stream` is a series of (possibly zero) `tlv_record`s, represented as the
concatenation of the encoded `tlv_record`s. When used to extend existing
messages, a `tlv_stream` is typically placed after all currently defined fields.

The `type` is a varint encoded using the bitcoin CompactSize format. It
functions as a message-specific, 64-bit identifier for the `tlv_record`
determining how the contents of `value` should be decoded.

cfromknecht marked this conversation as resolved.
Show resolved Hide resolved
The `length` is a varint encoded using the bitcoin CompactSize format
signaling the size of `value` in bytes.

The `value` depends entirely on the `type`, and should be encoded or decoded
according to the message-specific format determined by `type`.

### Requirements

The sending node:
- MUST order `tlv_record`s in a `tlv_stream` by monotonically-increasing `type`.
- MUST minimally encode `type` and `length`.
- SHOULD NOT use redundant, variable-length encodings in a `tlv_record`.

The receiving node:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about rejecting redundant types in the stream?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is covered by the "strictly increasing" requirement.

- if zero bytes remain before parsing a `type`:
- MUST stop parsing the `tlv_stream`.
- if a `type` or `length` is not minimally encoded:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarification: is minimally encoded referring to their being a varint type, i.e. you should encode '0x01' as '0x01', not '0xfe00000001'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niftynei correct, i would assume most impls of compact size already check this, tho i'm not positive

Copy link
Collaborator

@Roasbeef Roasbeef Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, btcd's wire package includes checks to ensure encoded integers are canonical: https://github.com/btcsuite/btcd/blob/master/wire/common.go#L491

- MUST fail to parse the `tlv_stream`.
- if decoded `type`s are not monotonically-increasing:
- MUST fail to parse the `tlv_stream`.
- if `type` is known:
- MUST decode the next `length` bytes using the known encoding for `type`.
- otherwise, if `type` is unknown:
- if `type` is even:
- MUST fail to parse the `tlv_stream`.
- otherwise, if `type` is odd:
- MUST discard the next `length` bytes.

### Rationale

The primary advantage in using TLV is that a reader is able to ignore new fields
that it does not understand, since each field carries the exact size of the
encoded element. Without TLV, even if a node does not wish to use a particular
field, the node is forced to add parsing logic for that field in order to
determine the offset of any fields that follow.

The monotonicity constraint ensures that all `type`s are unique and can appear
at most once. Fields that map to complex objects, e.g. vectors, maps, or
structs, should do so by defining the encoding such that the object is
serialized within a single `tlv_record`. The uniqueness constraint, among other
things, enables the following optimizations:
- canonical ordering is defined independent of the encoded `value`s.
- canonical ordering can be known at compile-time, rather that being determined
dynamically at the time of encoding.
- verifying canonical ordering requires less state and is less-expensive.
- variable-size fields can reserve their expected size up front, rather than
appending elements sequentially and incurring double-and-copy overhead.

The use of a varint for `type` and `length` permits a space savings for small
`type`s or short `value`s. This potentially leaves more space for application
data over the wire or in an onion payload.

All `type`s must appear in increasing order to create a canonical encoding of
the underlying `tlv_record`s. This is crucial when computing signatures over a
`tlv_stream`, as it ensures verifiers will be able to recompute the same message
digest as the signer. Note that the canonical ordering over the set of fields
can be enforced even if the verifier does not understand what the fields
contain.

Writers should avoid using redundant, variable-length encodings in a
`tlv_record` since this results in encoding the length twice and complicates
computing the outer length. As an example, when writing a variable length byte
array, the `value` should contain only the raw bytes and forgo an additional
internal length since the `tlv_record` already carries the number of bytes that
follow. On the other hand, if a `tlv_record` contains multiple, variable-length
elements then this would not be considered redundant, and is needed to allow the
receiver to parse individual elements from `value`.

## Setup Messages

### The `init` Message
Expand Down