Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting error codes #4087

Closed
tatsuhiro-t opened this issue Sep 10, 2020 · 16 comments · Fixed by #4088
Closed

Conflicting error codes #4087

tatsuhiro-t opened this issue Sep 10, 2020 · 16 comments · Fixed by #4088
Labels
-tls call-issued An issue that the Chairs have issued a Consensus call for. design An issue that affects the design of the protocol; resolution requires consensus.

Comments

@tatsuhiro-t
Copy link
Contributor

tatsuhiro-t commented Sep 10, 2020

The transport draft-30 introduced AEAD_LIMIT_REACHED(0xe) error code.
Meanwhile, the TLS draft also has KEY_UPDATE_ERROR error code (0xe), and they collide with each other.
I believe they are transport error code, not application error code.

@marten-seemann
Copy link
Contributor

It would probably be a good idea to list the KEY_UPDATE_ERROR in the error code table in the transport document.

@martinthomson
Copy link
Member

Ugh, that's unfortunate. I think that we need to renumber the newer one (and yeah, put them all in the same document so it doesn't happen again).

martinthomson added a commit that referenced this issue Sep 10, 2020
I guess technically we could have let both of these stand and used the
same error code for both conditions, but that seems unnecessary.  Moving
the AEAD limit seems easier.

Closes #4087.
@larseggert larseggert added this to Triage in Late Stage Processing via automation Sep 10, 2020
@DavidSchinazi
Copy link
Contributor

I believe we should fix the fact that we have two errors with the same code, however I think it would be best to just use 0xE for all these scenarios. At this point in the process, we're working on deploying h3-29 and having a breaking wire change like this would be very unfortunate. Additionally, I cannot think of any action an implementation would take that would be different between KEY_UPDATE_ERROR and AEAD_LIMIT_REACHED. Any chance we could just rename AEAD_LIMIT_REACHED to KEY_UPDATE_ERROR and call it a day?

@ekr
Copy link
Collaborator

ekr commented Sep 10, 2020 via email

@janaiyengar
Copy link
Contributor

I agree with @ekr. There is a separate question of whether -31 will be a new version or not, but let's not get ahead of ourselves.

@huitema
Copy link
Contributor

huitema commented Sep 14, 2020

I am trying to write a test for the generation of the key update error code. This is a bit hard, since by default the implementation will rotate keys after sending more than close-to-the-limit packets, and thus the condition is not seen. One could test that the peer has sent more than limit packets without rotating the keys, but then there is some fuzziness to what the limit is. For example, for AES GCM, should that be "received 2^25 packets" or "2^25 + 1"?

The error is also a bit silly. It requires tracking how many packets were decrypted with a given key, and then disconnecting with a specific error if the limit is exceeded. But the node that tracks the limit could also unilaterally trigger a key rotation if the receive count is getting close enough. Of course, the non-standard peer could refuse to update its own key, and never rotate the key phase. Is that the condition for KEY_UPDATE_ERROR?

@janaiyengar
Copy link
Contributor

@huitema: this issue is merely changing the error code for an error that already exists. I read your comment as a question about testing this, but you could be asking about when this error code ought to be sent.

My reading is that the sender is required to close the connection when it sends too many packets, and the receiver is required to close the connection when too many packets fail authentication, and I think the text is clear on precisely when both of these ought to happen.

A receiver is not required to, but could do what you're suggesting, and honestly, the precise value of "2^25 + 1" or "2^25" wouldn't matter IMO, since the receiver can only know a lower bound on the number of packets sent with the same key (since packet loss, etc). But if the receiver were to close with this error, doing it at the point that there is no ability for the sender to update the keys makes sense. The simplest thing to do would be to check if "2^25 + 1" packets have been received with the same key.

@huitema
Copy link
Contributor

huitema commented Sep 14, 2020

@janaiyengar the sender is NOT required to close the connection when it sends too many packets. That's not what the TLS limits say. The requirement is to use key rotation, move to the next epoch, and start using a new encryption key.

Reading the fine print, I believe the KEY_UPDATE_ERROR error will happen if the node starts a key rotation but the peer keeps using the old key for too long, for example if the peer continues using the old key but acknowledges packets sent with the new key. The AEAD_LIMIT_REACHED error will happen if the peer transmits too many packets for the epoch.

This is different from the integrity limit, which does require closing the connection if too many decryption failures are observed.

@janaiyengar
Copy link
Contributor

@huitema -- yes, I didn't read carefully; you're right.

So, as I suggested earlier IMO the best you can do at the receiver is check for 2^25 + 1, even though it is a lower bound. It does require tracking the number of packets as you say though.

@huitema
Copy link
Contributor

huitema commented Sep 15, 2020

I can't wait to see the first error report in which some poor dude tries to transmit more than 50GB over a QUIC connection and hits a bug in the implementation of key update...

@marten-seemann
Copy link
Contributor

@huitema There's another condition where KEY_UPDATE_ERROR might occur: If you acknowledge a packet in key phase N+1 using key phase N. It's just a MAY in the specification though (I don't know why, this should be trivial to implement):

An endpoint that receives an acknowledgement that is carried in a packet protected with old keys where any acknowledged packet was protected with newer keys MAY treat that as a connection error of type KEY_UPDATE_ERROR. This indicates that a peer has received and acknowledged a packet that initiates a key update, but has not updated keys in response.

@huitema

I can't wait to see the first error report in which some poor dude tries to transmit more than 50GB over a QUIC connection and hits a bug in the implementation of key update...

That's why we recently added a key update test to the interop runner. Unfortunately, it looks like a number of implementations (even those that already boasted about rolling out QUIC into production) treat key updates as some kind of optional feature, which it isn't. This is bad, as this now prevents others from rolling out QUIC implementations that initiate a key update early in the connection to exercise the mechanism.

@martinthomson
Copy link
Member

@marten-seemann, it would be non-trivial for neqo to implement that MAY. The information about what keys are used is quite well separated from the code that manages acknowledgments.

@marten-seemann
Copy link
Contributor

@martinthomson That's the same in quic-go.
There's another way to determine this condition though: As you need to keep track of the largest acknowledged PN anyway, you can check when you receive an ACK for a packet sent in the current key phase that you unprotected at least one packet in the current key phase. If that's not the case, the ACK for the current key phase must have been sent in the previous key phase, and you can throw an error.

@huitema
Copy link
Contributor

huitema commented Sep 15, 2020

@marten-seemann you and I were thinking of the same error condition, which I described as "for example if the peer continues using the old key but acknowledges packets sent with the new key".
And yes, I was also going to test that by implementing the "last acknowledged" check.

@marten-seemann
Copy link
Contributor

@huitema I just implemented this check in quic-go, and it was literally less than 10 LOC. Going to roll that out in my interop runner image soon, so we'll see if people are responding to key updates correctly.

@larseggert larseggert added the design An issue that affects the design of the protocol; resolution requires consensus. label Sep 16, 2020
@project-bot project-bot bot moved this from Triage to Design Issues in Late Stage Processing Sep 16, 2020
@larseggert larseggert moved this from Design Issues to Triage in Late Stage Processing Sep 16, 2020
@larseggert larseggert moved this from Triage to Design Issues in Late Stage Processing Sep 16, 2020
@larseggert larseggert moved this from Design Issues to Consensus Emerging in Late Stage Processing Sep 17, 2020
@larseggert larseggert moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Sep 17, 2020
@DavidSchinazi
Copy link
Contributor

It's looking to me like I'm alone in preferring to merge the two codes, so I just wanted to say that I don't feel too strongly here. If there's WG consensus to merge #4088 then I'm OK with that.

@martinthomson martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Sep 21, 2020
@project-bot project-bot bot moved this from Consensus Call issued to Consensus Emerging in Late Stage Processing Sep 21, 2020
@martinthomson martinthomson added call-issued An issue that the Chairs have issued a Consensus call for. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Sep 21, 2020
@project-bot project-bot bot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Sep 21, 2020
Late Stage Processing automation moved this from Consensus Call issued to Issue Handled Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-tls call-issued An issue that the Chairs have issued a Consensus call for. design An issue that affects the design of the protocol; resolution requires consensus.
Projects
Late Stage Processing
  
Issue Handled
Development

Successfully merging a pull request may close this issue.

8 participants