Conflicting error codes #4087

tatsuhiro-t · 2020-09-10T05:31:04Z

The transport draft-30 introduced AEAD_LIMIT_REACHED(0xe) error code.
Meanwhile, the TLS draft also has KEY_UPDATE_ERROR error code (0xe), and they collide with each other.
I believe they are transport error code, not application error code.

marten-seemann · 2020-09-10T05:40:26Z

It would probably be a good idea to list the KEY_UPDATE_ERROR in the error code table in the transport document.

martinthomson · 2020-09-10T06:47:35Z

Ugh, that's unfortunate. I think that we need to renumber the newer one (and yeah, put them all in the same document so it doesn't happen again).

I guess technically we could have let both of these stand and used the same error code for both conditions, but that seems unnecessary. Moving the AEAD limit seems easier. Closes #4087.

DavidSchinazi · 2020-09-10T19:13:13Z

I believe we should fix the fact that we have two errors with the same code, however I think it would be best to just use 0xE for all these scenarios. At this point in the process, we're working on deploying h3-29 and having a breaking wire change like this would be very unfortunate. Additionally, I cannot think of any action an implementation would take that would be different between KEY_UPDATE_ERROR and AEAD_LIMIT_REACHED. Any chance we could just rename AEAD_LIMIT_REACHED to KEY_UPDATE_ERROR and call it a day?

ekr · 2020-09-10T20:07:27Z

On Thu, Sep 10, 2020 at 12:13 PM David Schinazi ***@***.***> wrote: I believe we should fix the fact that we have two errors with the same code, however I think it would be best to just use 0xE for all these scenarios. At this point in the process, we're working on deploying h3-29 and having a breaking wire change like this would be very unfortunate. Additionally, I cannot think of any action an implementation would take that would be different between KEY_UPDATE_ERROR and AEAD_LIMIT_REACHED. Any chance we could just rename AEAD_LIMIT_REACHED to KEY_UPDATE_ERROR and call it a day?

I think it's useful to distinguish what happened for logging and diagnostic purposes. I also don't understand why this creates a significant implementation inconvenience. On the sending side, presumably you're just sending the wrong code for a while and it's a bug, and on the receiving side, given that you have to handle unknown error codes anyway (because they can be IANA Registered), what's the problem if you receive an unknown error code in this likely quite uncommon case?

janaiyengar · 2020-09-11T00:46:33Z

I agree with @ekr. There is a separate question of whether -31 will be a new version or not, but let's not get ahead of ourselves.

huitema · 2020-09-14T15:27:04Z

I am trying to write a test for the generation of the key update error code. This is a bit hard, since by default the implementation will rotate keys after sending more than close-to-the-limit packets, and thus the condition is not seen. One could test that the peer has sent more than limit packets without rotating the keys, but then there is some fuzziness to what the limit is. For example, for AES GCM, should that be "received 2^25 packets" or "2^25 + 1"?

The error is also a bit silly. It requires tracking how many packets were decrypted with a given key, and then disconnecting with a specific error if the limit is exceeded. But the node that tracks the limit could also unilaterally trigger a key rotation if the receive count is getting close enough. Of course, the non-standard peer could refuse to update its own key, and never rotate the key phase. Is that the condition for KEY_UPDATE_ERROR?

janaiyengar · 2020-09-14T22:51:13Z

@huitema: this issue is merely changing the error code for an error that already exists. I read your comment as a question about testing this, but you could be asking about when this error code ought to be sent.

My reading is that the sender is required to close the connection when it sends too many packets, and the receiver is required to close the connection when too many packets fail authentication, and I think the text is clear on precisely when both of these ought to happen.

A receiver is not required to, but could do what you're suggesting, and honestly, the precise value of "2^25 + 1" or "2^25" wouldn't matter IMO, since the receiver can only know a lower bound on the number of packets sent with the same key (since packet loss, etc). But if the receiver were to close with this error, doing it at the point that there is no ability for the sender to update the keys makes sense. The simplest thing to do would be to check if "2^25 + 1" packets have been received with the same key.

huitema · 2020-09-14T23:04:58Z

@janaiyengar the sender is NOT required to close the connection when it sends too many packets. That's not what the TLS limits say. The requirement is to use key rotation, move to the next epoch, and start using a new encryption key.

Reading the fine print, I believe the KEY_UPDATE_ERROR error will happen if the node starts a key rotation but the peer keeps using the old key for too long, for example if the peer continues using the old key but acknowledges packets sent with the new key. The AEAD_LIMIT_REACHED error will happen if the peer transmits too many packets for the epoch.

This is different from the integrity limit, which does require closing the connection if too many decryption failures are observed.

janaiyengar · 2020-09-14T23:26:19Z

@huitema -- yes, I didn't read carefully; you're right.

So, as I suggested earlier IMO the best you can do at the receiver is check for 2^25 + 1, even though it is a lower bound. It does require tracking the number of packets as you say though.

huitema · 2020-09-15T00:39:30Z

I can't wait to see the first error report in which some poor dude tries to transmit more than 50GB over a QUIC connection and hits a bug in the implementation of key update...

marten-seemann · 2020-09-15T01:08:17Z

@huitema There's another condition where KEY_UPDATE_ERROR might occur: If you acknowledge a packet in key phase N+1 using key phase N. It's just a MAY in the specification though (I don't know why, this should be trivial to implement):

An endpoint that receives an acknowledgement that is carried in a packet protected with old keys where any acknowledged packet was protected with newer keys MAY treat that as a connection error of type KEY_UPDATE_ERROR. This indicates that a peer has received and acknowledged a packet that initiates a key update, but has not updated keys in response.

@huitema

I can't wait to see the first error report in which some poor dude tries to transmit more than 50GB over a QUIC connection and hits a bug in the implementation of key update...

That's why we recently added a key update test to the interop runner. Unfortunately, it looks like a number of implementations (even those that already boasted about rolling out QUIC into production) treat key updates as some kind of optional feature, which it isn't. This is bad, as this now prevents others from rolling out QUIC implementations that initiate a key update early in the connection to exercise the mechanism.

martinthomson · 2020-09-15T01:57:52Z

@marten-seemann, it would be non-trivial for neqo to implement that MAY. The information about what keys are used is quite well separated from the code that manages acknowledgments.

marten-seemann · 2020-09-15T02:03:21Z

@martinthomson That's the same in quic-go.
There's another way to determine this condition though: As you need to keep track of the largest acknowledged PN anyway, you can check when you receive an ACK for a packet sent in the current key phase that you unprotected at least one packet in the current key phase. If that's not the case, the ACK for the current key phase must have been sent in the previous key phase, and you can throw an error.

huitema · 2020-09-15T02:38:52Z

@marten-seemann you and I were thinking of the same error condition, which I described as "for example if the peer continues using the old key but acknowledges packets sent with the new key".
And yes, I was also going to test that by implementing the "last acknowledged" check.

marten-seemann · 2020-09-15T02:42:37Z

@huitema I just implemented this check in quic-go, and it was literally less than 10 LOC. Going to roll that out in my interop runner image soon, so we'll see if people are responding to key updates correctly.

DavidSchinazi · 2020-09-17T17:01:53Z

It's looking to me like I'm alone in preferring to merge the two codes, so I just wanted to say that I don't feel too strongly here. If there's WG consensus to merge #4088 then I'm OK with that.

marten-seemann mentioned this issue Sep 10, 2020

use the KEY_UPDATE_ERROR quic-go/quic-go#2767

Closed

martinthomson added the -tls label Sep 10, 2020

martinthomson added a commit that referenced this issue Sep 10, 2020

Fix conflicting error codes

5647e5c

I guess technically we could have let both of these stand and used the same error code for both conditions, but that seems unnecessary. Moving the AEAD limit seems easier. Closes #4087.

martinthomson mentioned this issue Sep 10, 2020

Fix conflicting error codes #4088

Merged

larseggert added the wglc-post label Sep 10, 2020

larseggert added this to Triage in Late Stage Processing via automation Sep 10, 2020

marten-seemann mentioned this issue Sep 15, 2020

check that the peer updated its keys when acknowledging a key update quic-go/quic-go#2781

Merged

larseggert added the design An issue that affects the design of the protocol; resolution requires consensus. label Sep 16, 2020

project-bot bot moved this from Triage to Design Issues in Late Stage Processing Sep 16, 2020

larseggert moved this from Design Issues to Triage in Late Stage Processing Sep 16, 2020

larseggert moved this from Triage to Design Issues in Late Stage Processing Sep 16, 2020

larseggert moved this from Design Issues to Consensus Emerging in Late Stage Processing Sep 17, 2020

larseggert moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Sep 17, 2020

larseggert removed the wglc-post label Sep 18, 2020

martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Sep 21, 2020

project-bot bot moved this from Consensus Call issued to Consensus Emerging in Late Stage Processing Sep 21, 2020

martinthomson added call-issued An issue that the Chairs have issued a Consensus call for. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Sep 21, 2020

project-bot bot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Sep 21, 2020

martinthomson closed this as completed in #4088 Sep 24, 2020

Late Stage Processing automation moved this from Consensus Call issued to Issue Handled Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conflicting error codes #4087

Conflicting error codes #4087

tatsuhiro-t commented Sep 10, 2020 •

edited

Loading

marten-seemann commented Sep 10, 2020

martinthomson commented Sep 10, 2020

DavidSchinazi commented Sep 10, 2020

ekr commented Sep 10, 2020 via email

janaiyengar commented Sep 11, 2020

huitema commented Sep 14, 2020 •

edited

Loading

janaiyengar commented Sep 14, 2020

huitema commented Sep 14, 2020 •

edited

Loading

janaiyengar commented Sep 14, 2020

huitema commented Sep 15, 2020

marten-seemann commented Sep 15, 2020

martinthomson commented Sep 15, 2020

marten-seemann commented Sep 15, 2020

huitema commented Sep 15, 2020

marten-seemann commented Sep 15, 2020

DavidSchinazi commented Sep 17, 2020

Conflicting error codes #4087

Conflicting error codes #4087

Comments

tatsuhiro-t commented Sep 10, 2020 • edited Loading

marten-seemann commented Sep 10, 2020

martinthomson commented Sep 10, 2020

DavidSchinazi commented Sep 10, 2020

ekr commented Sep 10, 2020 via email

janaiyengar commented Sep 11, 2020

huitema commented Sep 14, 2020 • edited Loading

janaiyengar commented Sep 14, 2020

huitema commented Sep 14, 2020 • edited Loading

janaiyengar commented Sep 14, 2020

huitema commented Sep 15, 2020

marten-seemann commented Sep 15, 2020

martinthomson commented Sep 15, 2020

marten-seemann commented Sep 15, 2020

huitema commented Sep 15, 2020

marten-seemann commented Sep 15, 2020

DavidSchinazi commented Sep 17, 2020

tatsuhiro-t commented Sep 10, 2020 •

edited

Loading

huitema commented Sep 14, 2020 •

edited

Loading

huitema commented Sep 14, 2020 •

edited

Loading