Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOLT01: wire TLV proposal #607

Open
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@cfromknecht
Copy link
Contributor

commented May 9, 2019

This proposal outlines a TLV format intended for wire messages. It contains additional considerations for gossip messages that would require maintaining the ability to verify signatures over unknown fields, and the ability to propagate those messages to peers.

This proposal makes different tradeoffs from #603 in terms of:

  • requiring strict canonical encoding
  • fixed-size length encoding
  • supporting reserialization of unknown records
  • using 255 as the sentinel type identifier
  • removing even/odd requirement from the parsing logic, favoring negotiation via feature bits or application of such constraints external to the fundamental parsing requirements

Note that to unblock #519, it's not necessary to solidify wire TLV requirements for signed messages. They are included for completeness, but could be deferred.

Reference implementation: lightningnetwork/lnd#3061

Show resolved Hide resolved 01-messaging.md
`tlv_stream`s with different namespaces.

The monotonicity constraint ensures that all `type`s are unique and can appear
at most once.

This comment has been minimized.

Copy link
@t-bast

t-bast May 15, 2019

Is there a reason why you want to restrict types to appear at most once in a tlv stream? I think it could be useful and more flexible to let them appear multiple times.

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht May 16, 2019

Author Contributor

replied in main body where there was ample space 😬

This comment has been minimized.

Copy link
@t-bast

t-bast May 16, 2019

I think it could be useful to add a sentence here to explain that if multiple occurrences of the same type is needed, it should rather be done at the type definition level (so we won't need to re-discuss this when newcomers read the spec and wonder why this limitation is here).

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht May 16, 2019

Author Contributor

added in 2583ae2

@t-bast

This comment has been minimized.

Copy link

commented May 15, 2019

Thanks for this proposal @cfromknecht! It helps clarify some choices made in #603 by showing alternative choices.

After reading both proposals, I still think that we should decide on a single TLV encoding, enforced everywhere (wire, onion, etc), that combines your proposal with rusty's. Having distinct encodings is error-prone and confusing and I don't think the gains justify it (but it might be because some gains aren't obvious to me, feel free to detail them more if you disagree).

I'm in favor of keeping a varint for type and length: it can be more compact, it's more flexible and the performance impact is negligible (compared to the cryptographic operations that are done with every message).

I'm in favor of strict canonical encoding as you propose, but allowing multiple occurrences of the same type in a TLV stream.

I don't think the sentinel value is needed at all (and I think the result of the IRC discussion was a consensus on that, is that correct?).

I think it's still useful to keep the even/odd distinction to avoid feature flags bloat (I'm afraid we would need an explosion of feature flags which becomes more complex to track than the simple even/odd rule).

I agree with you that the optimization for integer values is a layering violation and I prefer your suggestion to let it be defined by the TLV type itself.

cfromknecht added some commits May 15, 2019

@cfromknecht

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

Thanks for the feedback @t-bast! I sincerely apologize in advance for the brevity of this reply, but it felt like a good time to do a brain dump from the last 6 months or so of research and explain the rationale behind certain aspects of the proposal that maybe haven't been discussed as much.

I don't think the sentinel value is needed at all (and I think the result of the IRC discussion was a consensus on that, is that correct?).

Yes that was the consensus, thanks to @cdecker @niftynei for helpful discussion! it's been removed in the fixups, along with the portions about retaining unknown values (which they convinced me is more of an implementation detail).

I'm in favor of keeping a varint for type and length: it can be more compact, it's more flexible and the performance impact is negligible (compared to the cryptographic operations that are done with every message).

I'm not married to the fixed-size encoding for the length, using a varint was actually how this proposal was drafted a few months back :) It is nice that in the fixed size case, this limit is enforced directly rather than needing to enforce the constraint as an additional check on the parsed value, since we should only need to handle up to two-byte values for wire messages.

I'm much more skeptical of making type a varint, with a worst-case size of 9 bytes and no other considerations like max message size to constrain the possible values. I'm fairly convinced that 64-bit namespace is overkill. If 8-bits really isn't enough, we can always create extensions to the namespace by nesting another tlv_stream in a select tlv_record. I really don't think (or hope) we'll need to, but it does exist as a safety valve.

performance impact is negligible (compared to the cryptographic operations that are done with every message).

Serialization under the wire format doesn't necessarily mean that the message is going to be encrypted, messages can be written to databases, over rpc connections, into message digests, etc. I agree tho that the overhead in moving from 2-byte length to varint (with realistic max of 65KB) is not so bad and can save space, however I don't think the same argument holds for type.

Consider a tlv_stream with types 0-255 under both proposals. The 8-bit identifiers can be held in a few cache lines, while the 64-bit identifiers require half of a standard page size, even though the size of the encodings are almost identical. An 8x memory overhead from types isn't negligible IMO, especially considering the vast majority of it would be zeros and is never even serialized in the message contents. Additionally, unlike lengths which can be dynamically computed at the time of encoding, types I suspect are more likely to be long-lived in order to retain the mapping to their respective fields.

Keep in mind we want to remain friendly to low-power devices with more constrained resources ;)

I'm in favor of strict canonical encoding as you propose, but allowing multiple occurrences of the same type in a TLV stream.

I'm with you on the strict canonical encoding!

There are a multitude of reasons however why I'm strongly opposed to allowing duplicate fields, mostly performance related (i swear, this is the shortened list):

  • Introduces unnecessary overhead:
    • The signaling overhead of each redundant field is at least 2 or 3 additional bytes. It's more efficient to have a single field and concatenate their encodings, e.g. num_items || encode(item0) || encode(item1) || ...
  • Unclear semantics of duplicate fields:
    • If duplicate fields are meant to be appended:
      • Then all TLV fields are inherently vectors, and one can't define just a uint32, it must be []uint32
        • This can be somewhat skirted if a particular type could specify that it can't be repeated, but the receiver would be unable to determine whether an unknown field is allowed to have one or multiple values
      • Since the number of duplicate records isn't known when reading the stream sequentially, this adds additional overhead of double and copy to facilitate expansion of the underlying array (or doing multiple passes i suppose, eek). Placing everything in a single record allows the size of vectors to be accurately reserved up front
    • If duplicate fields are meant to replace the prior field:
      • then why allow duplicates?
  • Increases complexity of producing canonical ordering:
    • If I'm not mistaken, the only logical sorting the receiver can apply to values is lexicographic sorting, since it has no access to the underlying type of unknown fields, so it must compare raw bytes
      • This implies that the writer must encode all of the fields in memory before being able to apply the sort, which results in higher memory requirements and would prevent TLV streams from being written directly to the wire without this extra preprocessing phase
      • As an example, sorting the unit32 0x0100ffff and the compressed uint32 0x01ffff result in different orderings when applying lexicographic vs integer sorting. If the receiver can only apply lexicographic sorting, the values must be encoded by the writer before trying to sort or else the stream may not be canonical
    • The canonical order can't always be determined at compile-time, and instead happens at the time of encoding. If the values change in content or length, this sort must be applied upon every encoding. From my benchmarks, sorting on every encoding yields a significant performance penalty, even when only sorting on 8-bit types, let alone a more complex comparison operator. The final version of the PoC moved to having the ordering supplied at compile time and vastly improved performance
  • Requires more state in verifying canonical ordering:
    • Instead of keeping only the type from the prior field, the receiver will need keep the length and value, which increases the state from constant to potentially the length maximum possible message size. When only types are compared, the value bytes can be discarded as they are processed (in certain cases, possibly implementation dependent?)

Compressing all repeated values into a single, unique record solves all of above the issues, with no real downsides AFAICT. Either the field is defined as a vector or it isn't, but those details remain in type's specified encoding rather than leaking into the parsing requirements. I think unique records are our friend here, and greatly reduces the code complexity while maintaining comparable performance to static serialization.

I think it's still useful to keep the even/odd distinction to avoid feature flags bloat (I'm afraid we would need an explosion of feature flags which becomes more complex to track than the simple even/odd rule).

On the wire, the sender can't just arbitrarily make fields required without knowing if the receiver supports it, doing so will result in a disconnect (since it'd be equivalent to failing to the parse the message). The receiver needs to opt-in to such fields before the sender can safely send them; we already have a mechanism for signaling this, what is the purpose in having the redundancy?

For example, all new fields added to gossip messages I think need to be optional, otherwise they won't propagate through older nodes. Having the even/odd rule there, or any wire message really, would only serve to halve the range of valid type identifiers. This is more critical if we solidify on a 8-bit type of course.

Onion TLV is different, since we can't rely on connection-level feature bit negotiation, whether we have the node's latest announcement, or if the node is still running the same software/configuration as it was when the invoice or announcement was created. At the same time, failure to parse a required onion TLV record only results in a failed HTLC, and not connection-level instability. I support having even/odd rule in onion, but I don't think the rule needs to be inherited by all TLV namespaces. IMO it should be applied as an additional constraint in the onion parsing. In my mind, the distinction is purely:

    // Standard TLV parsing for virtually everything
    var parser tlv.Parser
    if err := parser.Decode(r); err != nil {
        return err
    }

    // For onion TLV, apply even/odd rule as additional constraint
    if parser.FoundUnknownRequiredFields() {
        return tlv.ErrUnknownRequiredFields
    }

I think this gets us towards a more generic, unified TLV implementation that allows certain applications to apply restrictions as necessary, rather than restricting all other use cases for a single known instance where the even/odd rule is indeed very helpful.

After reading both proposals, I still think that we should decide on a single TLV encoding, enforced everywhere (wire, onion, etc), that combines your proposal with rusty's. Having distinct encodings is error-prone and confusing and I don't think the gains justify it (but it might be because some gains aren't obvious to me, feel free to detail them more if you disagree).

Yes, I concur that it would be nice to agree on one unified approach, but want to be clear that I don't necessarily want distinct formats. Though if the tradeoffs can't be reconciled entirely, I still think it's possible to agree on a set of fundamental requirements that permit a high degree of code reuse between the two.

TL;DR

With all that in mind and attempting to incorporate the pertinent feedback, i've updated the proposal to:

  • single byte type
  • varint length
  • no sentinels
  • no weird record retaining
  • strict canonical encoding by sender and receiver
  • unique records
  • no encoding-dependent requirements in the parser
  • leave enforcement of even/odd (or any additional constraints) at a higher-layer as needed

I thinkkk this captures the majority of the feedback that's been given over the past few weeks. I would then propose that we define the onion tlv format solely by adding a check for unknown required fields to the wire tlv requirements.

Let me know if I've blatantly missed/overlooked anything, further feedback is greatly appreciated! Hopefully that also further elaborates on some distinctions I see in wire vs onion tlv, or perhaps shows my ignorance as to why they should be treated the same. To anyone who read the whole thing, I owe you a beer :)

Aside: is it worth having a separate BOLT just for the TLV spec? I'm starting to think it might be helpful to have a place where we define useful encodings, e.g. compressed integers, so we can reference a single location for common tlv_record encodings in other sections of the spec.

@t-bast

This comment has been minimized.

Copy link

commented May 16, 2019

That's great, thanks a lot for this detailed post, it's very helpful (especially for newcomers such as myself). I'll hold you to that beer when we meet IRL ;)

Your argument for removing duplicate occurrences of a TLV type makes a lot of sense. As you suggest, it's a lot cleaner (and simplifies a lot the implementation) to let the type itself define whether it's an array or not. I'm fully on board with that (and the strict canonical encoding).

I've thought more about the feature flags vs even/odd rule, and reading your comment I now agree with you that the wire encoding doesn't need the even/odd rule. We will always need feature flags to know that the other peer understand our required TLV types before sending a message. And once we have such feature flags, we should consider all the other TLV types in the message as optional. It's especially obvious when we think about gossip re-transmission (staggered broadcast): if Bob re-transmits a message from Alice to his peers, for example to Carol, Carol should ignore every TLV type she doesn't understand (otherwise Alice could indirectly force a connection-close between Bob and Carol which is undesirable). We don't want Bob to have to compare each message's inner TLV types to each of his peers' feature flags to know if it's safe or not to re-transmit the message.

I would even argue that onion TLV doesn't need the even/odd rule either. Since we want to maximize payment path success, we will use channel feature flags to decide whether to include a peer in a payment path or not (because it's risky and costly to blindly include a node that would fail to parse a TLV type and thus fail the HTLC). So we know that this node understands all our required TLV types, and we can safely treat all other TLV types as optional (for example optional routing hints for next trampoline hop).

One nice benefit of removing the even/odd rule from onion TLV is that there is now no differences between onion and wire TLV (yay single encoding!).

Do note though that I think that the even/odd rule is useful for top-level messages (as defined in Bolt #1). But for TLV types I think the even/odd rule is unnecessary because all unknown TLV types should be safely ignored (and never fail a connection).

So we should be in agreement with using feature flags instead of the even/odd rule for TLV. However I'm still not convinced about type not being a varint. I think a varint is more flexible and simple to use than nesting TLVs. And until you need more than 1 byte's worth of TLV types, it's still a single byte so it doesn't have any impact. And once you need more types, using a varint has a smaller footprint than nesting another TLV...I agree it might make the code slightly more complicated but I think it's worth it because it feel cleaner and easier to reason about than nesting TLVs. But this is definitely not a blocking point on my end, we can go with a 1-byte type if more people lean towards that.

I don't have a strong opinion on putting the TLV proposal in its own BOLT, I think it's a good idea but I'm most certainly not the most suited person to judge that.

I feel that we're making a lot of progress and we're getting close :)
I'd love to have a thorough review of our discussion by @rustyrussell though: he initiated the TLV discussion, has a lot of experience with this kind of stuff and can potentially point out things we might have overlooked that would justify the use of the even/odd rule.

Phew that ended up being a long comment too, I'll have to buy the second round of beers to our readers!

@niftynei
Copy link
Contributor

left a comment

Few nits on wording. Otherwise looks good.

Here's a quick list of the differences between this and #603:

  • 2-byte vs varint size for type field
  • #603 specifies additional rules for condensing integer values
  • #603 has no explicit rules against duplicate type inclusion
  • #603 includes an even/odd rule for type numbering
  • #603 allows readers to not enforce ordering constraints
Show resolved Hide resolved 01-messaging.md Outdated

The use of a varint for `length` permits a space savings over a fixed 16-bit
`length` for `value`s whose encode is less than 253 bytes. When `tlv_stream` is
used in EOB multi-frame payload, this can potentially save payloads from

This comment has been minimized.

Copy link
@niftynei

niftynei May 16, 2019

Contributor

is EOB defined elsewhere? Since this PR also adds it to the spellcheck I'm gonna guess not. We should probably define it.

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht May 16, 2019

Author Contributor

You're right, I don't think it is. I ended up removing it and just referring to them as multi-frame payloads, since this better matches the language in #593

2583ae2

@cfromknecht

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

Thank you @niftynei for the review and @t-bast for further input 🎉

I've thought more about the feature flags vs even/odd rule, and reading your comment I now agree with you that the wire encoding doesn't need the even/odd rule.

Indeed, though I should note that we don't need a feature bit if the field is truly optional, it only needs to be defined if we wish to make it required. My hope is that this wouldn't create feature bits like i_undestand_type_34_on_query_short_channel_id, but rather be bundled into a more logical "feature", e.g. extended_gossip_queries.

Since they are optional, the sender is free to write it, but everyone can save on bandwidth if the field is serialized conditionally on the peer's support for that feature.

I would even argue that onion TLV doesn't need the even/odd rule either.

Yeah I'm not totally certain in either direction, I suppose one could even make a case for all onion TLV fields to be required. But deferring additional constraints to a higher-level means we can retroactively add these constraints where necessary. Perhaps those will be identified as we move forward with defining the MPP onion types.

Do note though that I think that the even/odd rule is useful for top-level messages (as defined in Bolt #1). But for TLV types I think the even/odd rule is unnecessary because all unknown TLV types should be safely ignored (and never fail a connection).

Which ones exactly? Since init is transmitted first, couldn't those still be gated by feature bits?

However I'm still not convinced about type not being a varint.

A wise professor once told me that if you're designing a protocol and think a certain limit is enough, double it—in reference to IPv4 address space lol.

I left this out of the proposal, but my original idea was strip out the 4 and 8 bit encodings from the CompactType and use a varint for length, so:

 - if x < 0xff:
   - write x
 - else write 0xff || little-endian(x)

If we use this for both type and length, I think it bounds both quite nicely. I'm much more amenable to varints that don't require 64-bit types.

The implementation obviously is pretty trivial, just its mostly a copy-pasta of CompactSize. This gives us equivalent performance to a single byte type for values < 255, but also some breathing room if 256 types isn't enough. Thoughts?

I feel that we're making a lot of progress and we're getting close :)

Sure feels like it!

I'd love to have a thorough review of our discussion by @rustyrussell though: he initiated the TLV discussion, has a lot of experience with this kind of stuff and can potentially point out things we might have overlooked that would justify the use of the even/odd rule.

Same here, as I don't think we can really move forward conclusively without his input on the tradeoffs :)

I'll hold you to that beer when we meet IRL ;)

Phew that ended up being a long comment too, I'll have to buy the second round of beers to our readers!

Looking forward to it! I suppose the length of this message warrants a third round 🍻

@cfromknecht cfromknecht force-pushed the cfromknecht:tlv branch from e1e8ada to 9587457 May 16, 2019

@cfromknecht

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2019

One other comment regarding using a varint type: encountering an EOF while parsing the type doesn't necessarily mean the stream is well-formed like it would with 1-byte types. We’d need to modify the language to consider an EOF well-formed only if zero bytes were read.

@t-bast

This comment has been minimized.

Copy link

commented May 17, 2019

Indeed, though I should note that we don't need a feature bit if the field is truly optional, it only needs to be defined if we wish to make it required. My hope is that this wouldn't create feature bits like i_undestand_type_34_on_query_short_channel_id, but rather be bundled into a more logical "feature", e.g. extended_gossip_queries.

I completely agree: a feature flag should indicate support for a bunch of required TLV types (in a given context) - for example option_trampoline_onion would indicate support for all the types necessary for trampoline onion routing. Then optional types can be added without defining any feature flag indicating their support (think of such types as optional hints that can be safely ignored, like routing hints in the case of onion) which gives more flexibility to quickly add new small optional optimizations.

A wise professor once told me that if you're designing a protocol and think a certain limit is enough, double it—in reference to IPv4 address space lol.

😆

I'm not very clear on your proposal for the tweaked varint (™️), maybe I'm missing something about the varint discussion in general. Bitcoin's varint use either 1, 3, 5 or 9 bytes. Your comments repeatedly mention preventing 64-bit integers (which I agree would be overkill), but that only happens in the case where we use 9 bytes, which I think would probably never happen. I expect our use of varint in TLV type to most of the time use 1 byte, and sometimes use 3 bytes.
To make sure I understand the motivation, are you proposing the tweaked varint (™️) variation because you'd like to be able to use only 2 bytes instead of 3 when we overflow the 1-byte limit (I agree that 3 bytes might not be necessary, 2 bytes would probably be enough)?

  • else write 0xff || little-endian(x)

How do you know you finished parsing x if the first byte doesn't explicitly tell you how many subsequent bytes are used?

Thanks for all those suggestions and the continued discussion :)

@cfromknecht

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2019

In the process of integrating the current wire TLV with @Roasbeef’s multi-frame implementation of #593, we encountered some interesting events that reminded me of some other reasons for having a sentinel.

When the bytes containing a tlv_stream can be larger than the actual stream, this will result in a parsing failure. For example, extra appended zeros will fail when enforcing canonical ordering over sorted types.

This happens when parsing payloads in #593, since only complete frames are interpreted without specifying the number of used bytes in the payload.

This leaves us with two options:

  1. Use a sentinel type
  2. Write the tlv_stream’s length as a varint at the beginning of the payload

Writing the length will require prepending 1 or 3 bytes to the payload.

On the other hand, using a sentinel type will require at most 1 byte, let me explain.

The original proposal serialized the sentinel’s length, which I’ve realized now is not necessary. We can simply stop after parsing a sentinel type, making the sentinel smaller. For either 8-bit or varint types, we’d choose a value that only requires 1 byte.

The original proposal also made the sentinel record optional, which means that when the payload fits exactly in the provided frames, it can simply be omitted. Thus, when the tlv_stream fits exactly in a given number of frames, there is no overhead.

When it doesn’t fit exactly, we use one extra byte, but at that point the frame has already been consumed and the remainder of the bytes can’t be reclaimed. In either case, no additional frames will ever be consumed using a sentinel value.

We can simplify the encoding procedure even more, by defining the sentinel value as 0x00. Observe that after encoding the tlv_stream the remaining bytes in the frame will still be set to 0. At least in the case of onion TLV, that means the writer never even has to be aware of the sentinel (or specifically omit it), it will be done implicitly if there are more than 0 bytes remaining in the frame (otherwise the payload fits exactly). I’m beginning to think this may have part of the reason perhaps @rustyrussell and @niftynei may have chosen 0x00 as the sentinel value in the other proposal?

In addition to the above, using a sentinel value offers another significant performance benefit over explicitly writing the length, in that the tlv_stream can be written directly without needing to compute its total length. Doing so requires either serializing the stream beforehand or introducing some more complex method of computing the total length of the stream before encoding (which gets hairy if the values can change in size after encoding, e.g. integer compression).

Given that it’s optional, is it worth sharing this between wire and onion TLV? I don’t see how it hurts, and IMO would make the base TLV requirements more flexible as a whole

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.