-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revised considerations for 0-RTT #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments, sorry.
draft-ietf-dprive-dnsoquic.md
Outdated
|
||
When resuming a session, a client MAY take advantage of the 0-RTT mechanism | ||
supported by QUIC. The 0-RTT mechanism MUST NOT be used to send data that is | ||
The 0-RTT mechanism MUST NOT be used to send data that is | ||
not "replayable" transactions. For example, a client MAY transmit a Query as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like you need a better definition of "replayable".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saradickinson I would like your advice here. The DNS operations are defined in an IANA registry that we could quote -- https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-5. The registry currently defines the following:
OpCode | Name | Reference |
---|---|---|
0 | Query | [RFC1035] |
1 | IQuery (Inverse Query, OBSOLETE) | [RFC3425] |
2 | Status | [RFC1035] |
3 | Unassigned | |
4 | Notify | [RFC1996] |
5 | Update | [RFC2136] |
6 | DNS Stateful Operations (DSO) | [RFC8490] |
7-15 | Unassigned |
Of all of these, I see the benefit of allowing Query
in 0-RTT, and the danger of allowing Update
or DSO
. I am not sure about IQuery
, Status
and Notify
. Advice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since IQuery
is obsolete, I don't think there is any danger (generates a NotImp AFAIK). Status
is similar and there was an effort 2 years ago to deprecate it but it failed because... well, DNSOP :-( Servers respond with one of NotImp, Refused or give no response at all.
Notify
is worth thinking about. Since it only effects secondaries and it can already be abused to trigger unnecessary SOA or XFR requests to a primary, most implementations already throttle the speed at which they will send those requests in response to one or more Notify
messages. Some operators require Notifies
to be TSIG signed before acting on the (which is also true of Update
). TSIG does provide a defence against replay but note that RFC8945 recommends a default Fudge value (i.e. time offset for validation) of 300s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the text should say that only query with operation type "Query" should be considered in 0-RTT. Is there a case where some servers should not allow some specific queries, e.g., because they are some kind of "active" query that starts a script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what https://datatracker.ietf.org/doc/html/draft-ietf-dprive-early-data-00 proposed, but this was questioned in the following discussion that then petered out... (an IANA registry of allowed QTYPES was proposed). I notice that that doc was adopted by DPRIVE but then seemed to stall and has now expired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not quote an expired draft. I also agree with Martin's general recommendation to use application level response codes rather than QUIC errors. Whenever we create a QUIC error, implementation have to end up mapping the QUIC error in some kind of transport semantic. How about giving servers the following choices:
- Process the request, if the operation type is "QUERY" and the query can be processed without side effect,
- Defer processing until the connection has been confirmed by the client (quote RFC 9001),
- Reject the request and prepare a DNS REFUSED error.
If the WG finally produces a 0-RTT draft, it would update DoQ.
If we want to do a draft update to generate discussion in the WG, then I would be OK with merging the as long as we have
|
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
…into more-robust-0rtt
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
I'm happy for some of my comments to be taken to other issues for follow-up, but there are still quite a few of them that are not resolved. Sara's questions are good ones (I think that the REFUSED + EDE(TOO_EARLY) thing belongs in the other draft, for instance) for taking to the list. For transparency, and to add to Sara's list:
|
@martinthomson: the last commit tries to address your concerns. I tried removing some of the redundancy with TLS, replacing text by reference to section 8 and appendix C.4 of TLS 1.3. I also removed the time-skew recommendation from the implementation requirements, but added a discussion of its effects in the privacy considerations. Ticket issuance recommendation simplified to just "make the duration long enough". I still need to work on the "replayable transactions" text. |
The "tighter definition" commit revises the anti-replay consideration, aligning with our emerging consensus that servers must either delay processing, refuse transactions with EDE "too early", or just close the connection. Not perfect, must most probably good enough to have a discussion with the WG. I added the reservation of the "too early" EDE. I also try to amend the text about freshness tests in the privacy considerations that @martinthomson described as too optimistic. At that point, I would like to check in this PR, so we can have a basis for new discussions. |
Yes, completed is sufficient. I was a bit too cautious. I certainly agree that smaller delays are better. On the other hand, just saying small begs the question, how small? Small compared to what? The cache attacks involve observing encrypted traffic, and then assessing the state of the cache after the client's message arrives to the recursive server. If the client request triggered a refresh of the cache, the attacker will find a TTL close to the maximum value minus the delay between assessment and replay. If the TTL is smaller than that, this means the value was probably in the cache before the replay, so the attacker cannot conclude that the target name was looked up. The generic cache attack does not require replaying 0-RTT data -- it also works against 1-RTT data. The attackers can increase the efficiency of the generic cache attack by assessing the state of the cache before the client's message arrives. That's where 0-RTT replay is useful, because the attacker can choose the time at which it does the replay. Once the attacker know the TTL of the target record in the cache, it can wait until that TTL expires and then do the replay, and get an assessment that is much less ambiguous. But of course the freshness taste limits how much the attacker can delay the replay. That's what, in my mind, causes the link between the maximum tolerance of that test and the potency of the attack. |
Yes, the question of "how small" is always "as small as possible", which is unsatisfactory. I was thinking that if the attacker wanted to confirm the contents of a query, they would replay that query until the cache entry is evicted. If the original query caused the cache entry to be created, your logic holds, but that is not necessary to mount the attack in question. The attacker only needs to watch for the edge caused by cache eviction. That is an increase in response time and maybe a request to the authoritative. They can then correlate that with the known cache expiry time of the record. As you say, the attacker can learn when the record expires through their own queries or by observing when the authoritative was queried; assuming that the client populated the cache is not helpful. The true defense against this sort of thing is likely k-anonymity. If there could be k >>1 records that could have expired at that time, then the attacker only knows that one of those k records was queried for. That is hard to achieve if you have strong traffic analysis that supports linking a query from a stub to a query to an authoritative, especially when an authoritative server doesn't receive much traffic. |
What is the limit to "as small as possible"? Clearly, that depends on how precise the clocks are, whether they may drift, etc. I know that in picotls, Kazuho set that to 10 seconds. Could it be smaller? 1 second? 100 milliseconds? Should we ask the TLS working group for guidance? |
The importance of the issue also depends on the scenario. Are we really concerned about that attack in the recursive to authoritative scenario? And then, if we are concerned mostly about recursive resolvers, are there specific defenses that these resolvers could implement, like your description of k-anonymity? Maybe something as simple as "if I have to refresh 1 cache entry, I will refresh K entries at the same time?" Not that I would want to specify that now, but we could have some weasel text... |
OK, I thought it over, and I think we have a solution for the 0-RTT issue. Basically, align with section 8 of TLS 1.3. The issue of course is that the recommended mechanisms in TLS 1.3 require shared state between all servers in a "system". In general, that's hard. But in the case of DNS over QUIC, the system is defined by "all servers that share the same DNS cache", and that's much more palatable. For example, if all servers that share the same cache can also cache the identifiers of 0-RTT requests received for the duration of the freshness checks, then we have an effective mitigation against replay attack. |
Revised the text of the 0-RTT recommendations, based on feedback from Robert Evans and @martinthomson