Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make transport parameter ID and length varint #3294

Closed
DavidSchinazi opened this issue Dec 10, 2019 · 43 comments · Fixed by #3169
Closed

Make transport parameter ID and length varint #3294

DavidSchinazi opened this issue Dec 10, 2019 · 43 comments · Fixed by #3169
Assignees
Labels
-transport design has-consensus

Comments

@DavidSchinazi
Copy link
Contributor

@DavidSchinazi DavidSchinazi commented Dec 10, 2019

Now that we have turned almost every integer in the spec to varint, and we have removed any mention of the TLS presentation language, it makes sense to make transport parameters varint as well. The current uint16-based encoding sticks out like a sore thumb. That said, this change is more than cosmetic - transport parameters are our main joint for extending QUICv1, and switching to 62-bit will ensure we do not run out of space, ever. If we do not do this now, we will not be able to change it later in QUICv1 as these are used during the handshake.

This issue was originally described in #3020 but the feedback in Cupertino was that #3020 was conflating three different topics:

  1. the language used to describe transport parameters. That has been changed away from the TLS presentation language via merged PR #3108.
  2. the policy for the transport parameter IANA registry. This has been changed by merged PR #3170.
  3. the encoding of transport parameter ID and length over the wire. This is discussed in this issue and the associated open PR #3169.

From the discussions I've seen, it looks like there was some opposition and some support. I'm opening this issue now so we can have that conversation here independent of topics (1) and (2) above.

@dtikhonov
Copy link
Contributor

@dtikhonov dtikhonov commented Dec 10, 2019

As @kazuho pointed out in #3169, this is effectively moving the goal posts. Quoting the conclusion of this discussion in Cupertino:

Jana: So the proposal is just draw a diagram to not use TLS presentation format; and then separately flipping the allocation of the spec required and experimental space.

Using 3020 to track changing the space allocation. Separate PR for editorial change.

Changing TPs to varint is to change the decision we've already reached. Which is fine -- but only as long as we state it this way.

@DavidSchinazi
Copy link
Contributor Author

@DavidSchinazi DavidSchinazi commented Dec 11, 2019

@dtikhonov apologies for splitting hairs, but since we're discussing process: the QUIC WG does not establish consensus in meeting rooms. All decisions are made in consensus calls from the chairs on the mailing list. What you described was what was said in Cupertino, which reflects some attendants' opinions, not any decision from the WG. The purpose of this issue it to establish the consensus of the WG: in order to close the issue we will need to establish whether we have consensus to make this change (close by merging PR) or consensus to close without action.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Dec 11, 2019

2^16 is a large space. Do you expect to run out of 64K transport parameters for QUICv1?

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 11, 2019

@DavidSchinazi

@dtikhonov apologies for splitting hairs, but since we're discussing process: the QUIC WG does not establish consensus in meeting rooms. All decisions are made in consensus calls from the chairs on the mailing list.

I'm sorry to point this out, but if we are to discuss about process, isn't it the case that the consensus call for resolving #3020 has ended on Monday? Quoting from https://mailarchive.ietf.org/arch/msg/quic/voKkeoWpl5J7qtZkTJcDfIiXhls:

> * #3020: Transport parameter registry is too constraining for innovation
>   The proposal is <https://github.com/quicwg/base-drafts/pull/3170>

PS. What I am stating here is that while I would be sympathetic to the argument that we have different understandings on what we have agreed in Cupertino, process-wise, we have finished a consensus call based on the comments on #3020, which concludes to "only replace the TLS syntax with something easier to understand", and "shuffle the assigned ranges in the registry".

@DavidSchinazi
Copy link
Contributor Author

@DavidSchinazi DavidSchinazi commented Dec 11, 2019

@janaiyengar 2^16 is not a large space. Trying to request a UDP port from IANA is an informative way to find out just how small 2^16 is, and how strongly defended a 2^16 registry becomes over time. I acknowledge that we have different IANA procedures than UDP ports, but given enough time, we're bound to hit the limit.

@kazuho That's a reasonable point, and I respect it. My understanding of that consensus call was that it only covered the IANA portion of the issue. But perhaps we should focus on the technical reasons why we want, or don't want, to make the proposed change? :)

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Dec 11, 2019

Alright kids, no more process fighting. My recollection is hazy, but I remember us punting this question. We might have differing recollections, and it's worth coming to a conclusion again here to write it down clearly.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Dec 11, 2019

@DavidSchinazi: In terms of scale, it seems to me that we are talking about the equivalent of TCP options, not the equivalent of the number of all applications that have (and might) run over TCP, ever.

We also have the benefit of versions.

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 11, 2019

I also think that we do not need more than 2^16 space for Transport Parameters.

In addition to what we know from how TCP options are being used (as @janaiyengar points out), I would also point out that every TP consumes space in the handshake packets that the endpoints exchange. To paraphrase, there is an incentive to keep the number of TPs being used small. I do not think we'd ever see endpoints sending thousands of TPs.

Assuming that is the case, there is no practical reason to switch to varints. As I have pointed out in #3020, some of us have been reusing a encoder / decoder designed for TLS, switching to a varint-based design would require those to write a new codec specifically for QUIC. That's just a burden.

I think we should concentrate on fixing real issues rather than requiring everybody to make an unneeded change.

@marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Dec 11, 2019

I’m in favor of making this change. There’s value in having a consistent encoding across the protocol.

I don’t think that building a parser for this frame would be a real burden, considering that implementations already have parsers for 15 or so frame types.

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 11, 2019

There’s value in having a consistent encoding across the protocol.

While I understand that it looks inconsistent from QUIC, current design is consistent to TLS. And Transport Parameters is a TLS extension.

@DavidSchinazi
Copy link
Contributor Author

@DavidSchinazi DavidSchinazi commented Dec 11, 2019

I don't think the current design (what's in draft-24) is consistent with TLS - many of our transport parameters have varint values, and TLS doesn't have those.

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 11, 2019

The model is that the content of each Transport Parameter is built by QUIC, whereas they are encoded by TLS.

FWIW, the transition to the current model was made in #1608. During the time (late 2018), we discussed the possibility of moving the TP ID and length fields to varints as well, see https://mailarchive.ietf.org/arch/msg/quic/KMSoDdUGs-R7P_GfcBJTxvJARiw, and we decided to not, some stating that it's already late. #3020 was not the first time discussing the issue.

@marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Dec 11, 2019

The model is that the content of each Transport Parameter is built by QUIC, whereas they are encoded by TLS.

That's a pretty complicated model to begin with. TLS has nothing to do with the transport parameters, other than that it carries them as an opaque blob.
I cannot see any reason for using this model than being able to use TLS presentation language. Which we removed in #3108.

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 11, 2019

@marten-seemann

I cannot see any reason for using this model than being able to use TLS presentation language. Which we removed in #3108.

I do not think that is a fair argument. As I have stated in #3169 (comment), it is my understanding that we agreed on only switching to a QUIC-based notation, while preserving the uint16 encoding, which coincides with what #3020 states.

I can see that we disagree on that. If that is the case, maybe we should start by first going back to the TLS notation?

@marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Dec 11, 2019

If that is the case, maybe we should start by first going back to the TLS notation?

If I remember correctly, the main argument for giving up on the TLS notation was improved readability for people who are not familiar with TLS representation language. From the viewpoint of consistency in encoding, switching back to it would definitely be an improvement though.
That being said, I still don't understand why TLS should be involved with the encoding of a value that it never treats as anything other than an opaque blob.

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 11, 2019

If I remember correctly, the main argument for giving up on the TLS notation was improved readability for people who are not familiar with TLS representation language.

Re moving away from TLS notation, my memory is consistent to what you describe.

From the viewpoint of consistency in encoding, switching back to it would definitely be an improvement though.

I think you make a good point here. Assuming that we keep the status quo, it would be helpful to the readers if we explained why it is designed as such. Maybe something like following would be a good idea: To encourage code reuse, the encoding and validation requirements of Transport Parameters follows that of TLS extensions (RFC 8446; section 4.2](https://tools.ietf.org/html/rfc8446#section-4.2)).

That being said, I still don't understand why TLS should be involved with the encoding of a value that it never treats as anything other than an opaque blob.

This is the only case in QUIC where we need to skip an opaque value of unknown type. That's why we use a type-length-value. We are also required to check that each parameter appears at most once, and also that the length of known parameters are as expected.

These requirements happen to be exactly equivalent to what is required for TLS extensions. And therefore it is beneficial if the existing code in TLS stack can be reused. Considering that QUIC already has a pretty tight coupling with TLS, I do not think it is unnatural to endorse such reuse.

@dtikhonov
Copy link
Contributor

@dtikhonov dtikhonov commented Dec 11, 2019

Another advantage to using TLS Presentation Language (TLSPL) is that one does not have to learn (or make) YAPL (yet another presentation language). As a matter of fact, it would be (and was) convenient to state explicitly that TLSPL is used. Putting the reference to TLSPL back into the document is a fine idea.

@nibanks
Copy link
Member

@nibanks nibanks commented Dec 11, 2019

I support this change for a numbers of reasons:

  1. The QUIC layer is the one that reads and writes the contents of the TP, not the TLS layer. QUIC libraries cannot be dependent on TLS libraries to encode/decode the individual parameters. QUIC and TLS should be independent enough, so that they can evolve in parallel, without requiring changes to both, whenever one changes. Not to mention, not all/many TLS libraries will/do expose this type of functionality already.
  2. No where else in the QUIC code/spec do I need to use or understand TLS encoding. This shouldn't be an exception to that.
  3. The current encoding TLS scheme requires an extra, unnecessary length field for the whole TP, because of how TLS presentation language specifies the list of parameters. IMO, this just makes the code more awkward and I'd like to be able to remove it.

@RyanTheOptimist
Copy link
Contributor

@RyanTheOptimist RyanTheOptimist commented Dec 11, 2019

I heartily agree with what @nibanks wrote. QUIC reads and writes TP and when QUIC reads and writes integers, varints are the way to go (as evidences by TPs with varint values). Using varints here is more consistent with the rest of QUIC code and is simpler.

@ianswett
Copy link
Contributor

@ianswett ianswett commented Dec 11, 2019

+1 to what @nibanks said.

@MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Dec 11, 2019

I understood the feedback in Cupertino as being to decouple the questions, not to reject any of them outright. The PR mixed several changes, and it was hard to discuss the combined PR effectively.

Fundamentally, this seems to be a case of design reflecting a coding pattern. If your pattern is that the QUIC layer generates a binary structure, hands the opaque blob to TLS and says "this is the payload of extension foo," then it makes total sense to varint everything, because it's part of QUIC.

If your coding pattern is that the transport parameters are a known structure and TLS is handed a key-value store that it knows how to serialize, then it makes total sense to TLSPL everything, because it's part of TLS.

On the whole, the first pattern seems more likely to me, but fundamentally we're shipping a code architecture choice here.

@mikkelfj
Copy link
Contributor

@mikkelfj mikkelfj commented Dec 11, 2019

I can easily see a QUIC version that does not use TLS, so making TP independent of TLS would be fine by me.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Dec 11, 2019

I'd like to be clear about one thing: the proposed change is not about extensibility, this is a cosmetic change. I get that it might be "nicer" to some to have this be more uniform with the rest of QUIC. But I'm not hearing more arguments about the space being too constraining.

This change is not debilitating to anyone, and I expect that we'll all be able to live with either decision. I'm not keen about cosmetic changes at this point. As @kazuho noted, several of us said in late 2018 that it was too late to make this change because it was too late then. It's even later now.

I don't think this is important enough to change the handshake yet again at this late stage.

@nibanks
Copy link
Member

@nibanks nibanks commented Dec 11, 2019

I personally don't think the argument can be made that we're too late in the game to make changes that effect the handshake when we just merged a PR that completely rewrites the whole Retry path. That's a huge change! All because folks argued that UDP checksum was not enough. This is like a 15 minute code change. That Retry change is at least half a day most likely. Also, this change is not purely a cosmetic change, since it removes an extra (and unnecessary) list length.

@dtikhonov
Copy link
Contributor

@dtikhonov dtikhonov commented Dec 11, 2019

I personally don't think the argument can be made that we're too late in the game to make changes that effect the handshake when we just merged a PR that completely rewrites the whole Retry path.

The two changes are different. The Retry change addresses an obvious technical deficiency in the protocol and thus it passed the Late Stage filter. Changing TP IDs to varints is not on the same urgency level.

This is like a 15 minute code change.

15 minutes, eh? You are a better programmer than I am; or a better hyperbolator.

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 11, 2019

@MikeBishop

Fundamentally, this seems to be a case of design reflecting a coding pattern. If your pattern is that the QUIC layer generates a binary structure, hands the opaque blob to TLS and says "this is the payload of extension foo," then it makes total sense to varint everything, because it's part of QUIC.

If your coding pattern is that the transport parameters are a known structure and TLS is handed a key-value store that it knows how to serialize, then it makes total sense to TLSPL everything, because it's part of TLS.

I think that this is a good summary of the trade-off that we have here.

FWIW, the latter approach (using a helper provided by TLS) can use something other than a key-value store.

As an example, picotls does not provide an API for encoding / decoding extensions, but provides a closure-based API1 for encoding decoding TLS blocks instead, which is used by quicly. The code that encodes TP is at quicly.c line 1389-1435, the code that decodes TPs is at line 1459-1546. Because the encoding / decoding process are defined as callbacks of the TLS stack, they directly read from / append to the buffer built by the TLS stack.

As stated in my previous comment, encoding / decoding TPs is a complicated task that happens as part of the TLS handshake message processing, it would be helpful to allow these kinds of code reuse.

If we are to switch to varints for both TP ID and length, there would be much much less chance of such code reuse, because the structures of the TLV encoding would depend on the varint encoding of QUIC. In case of quicly, we would be copy-pasting the helper functions provided by picotls, changing the code that handles uint16 to varints.

That's certainly possible, but the additional complexity for us is far more than the cost of using uint16 instead of varints here.

1: With the caveat that something that looks like a closure (or a block of code) is actually a macro in case of C though :-)

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 12, 2019

@nibanks

This is like a 15 minute code change.

I think that this comment illustrates the difference of pain among the parties.

For people having a type-length-value codec specific to QUIC, the pain of sticking uint16 here is as small as something that can be modified in 15 minutes.

Compared to that, the pain would be much bigger for people sharing the codec with TLS, if we decide to move to a varint-based design. This is because that move would enforce those people to have two codecs instead of one, even though the logic would be equivalent with the exception of the integer representations being used.

The requirements on the type-length-value codec is not trivial. The decoder is required to:

  • parse IDs and block lengths
  • check that the block lengths were consistent for parameters that have been parsed
  • detect duplicate IDs

For those who can share the codec, it helps a lot if these logic can be kept and maintained in one place.

I see that some people are arguing that the encoding can be different based on the view that TP belongs to QUIC. I dispute that. We deliberately decided to mix crypto handshake and transport negotiation in QUIC. TP is a cross-layer thing that is exchanged during the TLS handshake.

@mnot mnot added this to Triage in Late Stage Processing Dec 12, 2019
@marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Dec 12, 2019

In addition to the points that @nibanks mentioned, there's another advantage of defining our own format for transport parameters:

  • We can require the list to be ordered by parameter ID (I've suggested this before somewhere, but #3169 doesn't yet incorporate this suggestion).

This would make the detection of duplicate parameters (which is now just a SHOULD due to the computational complexity) trivial, and would allow us to turn the SHOULD into a MUST. Furthermore, it reduces the profiling surface exposed by a QUIC implementation: I would assume that today it's probably possible to identify a QUIC implementation just by looking at the order of the transport parameters it sends.

@kazuho

I see that some people are arguing that the encoding can be different based on the view that TP belongs to QUIC. I dispute that. We deliberately decided to mix crypto handshake and transport negotiation in QUIC. TP is a cross-layer thing that is exchanged during the TLS handshake.

I disagree with this statement. The reason we're putting transport parameters into the TLS handshake is because they're sent in Initial and that's basically the only way we can detect a modification by an on-path attacker. By putting them into the TLS handshake they become part of the transcript and therefore prevents this attack.
If there was a way to send transport parameters in encrypted packets, there would be no reason not to define a QUIC SETTINGS frame for them.

@MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Dec 12, 2019

We deliberately decided to mix crypto handshake and transport negotiation in QUIC. TP is a cross-layer thing that is exchanged during the TLS handshake.

I'm ambivalent about this assertion. When we had a section on requirements for crypto handshake protocols, one of the requirements was to carry and authenticate the provided opaque blob. Using a TLS extension was then the TLS incarnation of that requirement.

Since then, we've decided this version is TLS-only and future versions can figure things out for themselves. From that choice, we've opted to integrate packet types and encryption with the TLS record layer and key schedule for this version. I don't agree that this is an area we've deliberately decided to mix layers, but I do agree that we could reasonably make that choice. Clearly your implementation has, regardless, and I don't see that as a bad choice.

@mikkelfj
Copy link
Contributor

@mikkelfj mikkelfj commented Dec 12, 2019

In relation to the version negotiation draft, it is a requirement that a server is capable of processing older still popular versions of the Initial packet. If some versions are very TLS specific and some are not, this leads to complexity even if on the surface, some implementations might finder it simpler to leverage TLS tooling for a specific version.

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 16, 2019

@MikeBishop I think I mostly agree with the observation.

Draft-12 essentially changed QUIC to a TLS record layer. QUIC now has four record types because that is what TLS 1.3 has. The key discards are designed based on how TLS 1.3 starts using new keys (i.e. handshake key is used by the server first, and the client is the last to use it).

And moreover, there is another instance we use uint16, in addition to TPs. That is through the use of HKDF-Expand-Label function of TLS.

QUIC uses HKDF-Expand-Label for deriving protection keys from traffic secrets generated by TLS. As the derivation and use of protection keys is specific to QUIC, we could have directly invoked the HKDF-Expand function defined in RFC 5869. But in status quo, QUIC uses HKDF-Expand-Label, defined in TLS 1.3, which has a uint16 field in the Label encoding.

To me, this looks like an example of encouraging code reuse, much like the use uint16 for encoding TP ID and lengths.

If our principle is going to be to minimize the interface between TLS and QUIC, we should consider changing the points I've stated above.

Or if our principle is to encourage (or leave room for) code reuse between TLS and QUIC, the encoding of TP IDs and lengths should be kept as-is, along with the use of HKDF-Expand-Label, how we drop keys, the use of four epochs ...

@martinthomson
Copy link
Member

@martinthomson martinthomson commented Dec 16, 2019

The question here is to what extent we consider the TP encoding to be part of the QUIC stack as opposed to the TLS stack. One school treats the transport parameters as an opaque carrier of bits, leaning on the abstract interface we have defined between transport and TLS. The other says that implementation strategy might differ and the similarities with TLS can be exploited in some implementations.

The parallel drawn to the KDF function TLS provides is clearer - at least in my mind - about this distinction. TLS provides a KDF in the form KDF(secret, label, output_length) that QUIC uses. We did at various points in the process consider modifications to that KDF, but the current form is very clear about leaving the KDF function unmodified.

I think are more pertinent questions here is the relative cost change against the improvements that might be gained. The advantage of retaining the current form is that it can be more tightly integrated into a TLS stack. The advocates for change see the potential for removing a single instance of bespoke encoding from the QUIC stack, plus a modest reduction in handshake size.

We've heard that at least one implementation would find changes to be disruptive. I might separately observe that disruption to an existing implementation is not a great motivation for opposing a change on its own, because we all knew the risks when we decided to build to a draft specification.

If we were to make arguments from some principled basis, I would say that the bag of bits view holds: QUIC is responsible for constructing and consuming transport parameters as an opaque blob. But that's not really a useful basis for making a decision when there are more substantive matters at stake.

@mikkelfj
Copy link
Contributor

@mikkelfj mikkelfj commented Dec 16, 2019

@kazuho
Copy link
Member

@kazuho kazuho commented Dec 16, 2019

@mikkelfj As I have stated previously, the design of QUIC is tied to TLS 1.3 in many ways, deeply enough that I do not think decoupling the encoding of TP would have practical difference on the amount of changes we need to make, when we are to support some other handshake mechanism. Not to mention that if we are going to use something other than TLS is a hypothesis.

@huitema
Copy link
Contributor

@huitema huitema commented Dec 23, 2019

I really dislike gratuitous changes at this stage in the process!

@ekr
Copy link
Collaborator

@ekr ekr commented Jan 2, 2020

@LPardue LPardue added the design label Feb 4, 2020
@project-bot project-bot bot moved this from Triage to Design Issues in Late Stage Processing Feb 4, 2020
@britram
Copy link
Contributor

@britram britram commented Feb 5, 2020

I wouldn't characterize changes to make the design of the protocol self-consistent as gratuitous; this seems like it'll reduce implementation confusion. And changing this encoding in future versions seems like a good place for parser vulnerabilities to hide, so I'm in favor of making this change.

@larseggert
Copy link
Member

@larseggert larseggert commented Feb 5, 2020

Discussed in ZRH. Proposed resolution is to make this change and apply #3169.

@larseggert larseggert added the proposal-ready label Feb 5, 2020
@project-bot project-bot bot moved this from Design Issues to Consensus Emerging in Late Stage Processing Feb 5, 2020
@DavidSchinazi
Copy link
Contributor Author

@DavidSchinazi DavidSchinazi commented Feb 5, 2020

Following feedback from the room, I've reverted the preferred_address part of this change, and as far as I know this is ready to merge once we confirm consensus on the list.

@LPardue
Copy link
Member

@LPardue LPardue commented Feb 5, 2020

Discussion of the preferred_address requires a separate issue, so whomever is a proponent for that please create one.

@DavidSchinazi
Copy link
Contributor Author

@DavidSchinazi DavidSchinazi commented Feb 5, 2020

@LPardue the preferred_address change is discussed in issue #2944.

@MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Feb 12, 2020

In fairness, the outcome documented on that issue and the minutes from Cupertino was that, if we decide to make this change, the modified layout would be considered as part of it.

@martinthomson
Copy link
Member

@martinthomson martinthomson commented Feb 12, 2020

I believe that consideration was given. Maybe if you were advocating for the change, the cursory manner of its dismissal might be unsatisfactory, but ample opportunity was given to argue for including the change.

@LPardue LPardue moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Feb 17, 2020
@LPardue LPardue removed the proposal-ready label Feb 17, 2020
@LPardue LPardue moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Feb 19, 2020
@LPardue LPardue added the has-consensus label Feb 19, 2020
Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design has-consensus
Projects
Late Stage Processing
  
Issue Handled
Development

Successfully merging a pull request may close this issue.