Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised considerations for 0-RTT #59

Merged
merged 18 commits into from
Aug 12, 2021
Merged

Revised considerations for 0-RTT #59

merged 18 commits into from
Aug 12, 2021

Conversation

huitema
Copy link
Owner

@huitema huitema commented Jul 31, 2021

Revised the text of the 0-RTT recommendations, based on feedback from Robert Evans and @martinthomson

@huitema
Copy link
Owner Author

huitema commented Jul 31, 2021

Addresses issues #54 and #57

Copy link
Collaborator

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments, sorry.


When resuming a session, a client MAY take advantage of the 0-RTT mechanism
supported by QUIC. The 0-RTT mechanism MUST NOT be used to send data that is
The 0-RTT mechanism MUST NOT be used to send data that is
not "replayable" transactions. For example, a client MAY transmit a Query as
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like you need a better definition of "replayable".

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saradickinson I would like your advice here. The DNS operations are defined in an IANA registry that we could quote -- https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-5. The registry currently defines the following:

OpCode Name Reference
0 Query [RFC1035]
1 IQuery (Inverse Query, OBSOLETE) [RFC3425]
2 Status [RFC1035]
3 Unassigned  
4 Notify [RFC1996]
5 Update [RFC2136]
6 DNS Stateful Operations (DSO) [RFC8490]
7-15 Unassigned  

Of all of these, I see the benefit of allowing Query in 0-RTT, and the danger of allowing Update or DSO. I am not sure about IQuery, Status and Notify. Advice?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since IQuery is obsolete, I don't think there is any danger (generates a NotImp AFAIK). Status is similar and there was an effort 2 years ago to deprecate it but it failed because... well, DNSOP :-( Servers respond with one of NotImp, Refused or give no response at all.
Notify is worth thinking about. Since it only effects secondaries and it can already be abused to trigger unnecessary SOA or XFR requests to a primary, most implementations already throttle the speed at which they will send those requests in response to one or more Notify messages. Some operators require Notifies to be TSIG signed before acting on the (which is also true of Update). TSIG does provide a defence against replay but note that RFC8945 recommends a default Fudge value (i.e. time offset for validation) of 300s.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the text should say that only query with operation type "Query" should be considered in 0-RTT. Is there a case where some servers should not allow some specific queries, e.g., because they are some kind of "active" query that starts a script?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what https://datatracker.ietf.org/doc/html/draft-ietf-dprive-early-data-00 proposed, but this was questioned in the following discussion that then petered out... (an IANA registry of allowed QTYPES was proposed). I notice that that doc was adopted by DPRIVE but then seemed to stall and has now expired.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not quote an expired draft. I also agree with Martin's general recommendation to use application level response codes rather than QUIC errors. Whenever we create a QUIC error, implementation have to end up mapping the QUIC error in some kind of transport semantic. How about giving servers the following choices:

  1. Process the request, if the operation type is "QUERY" and the query can be processed without side effect,
  2. Defer processing until the connection has been confirmed by the client (quote RFC 9001),
  3. Reject the request and prepare a DNS REFUSED error.

If the WG finally produces a 0-RTT draft, it would update DoQ.

draft-ietf-dprive-dnsoquic.md Outdated Show resolved Hide resolved
draft-ietf-dprive-dnsoquic.md Outdated Show resolved Hide resolved
draft-ietf-dprive-dnsoquic.md Outdated Show resolved Hide resolved
draft-ietf-dprive-dnsoquic.md Outdated Show resolved Hide resolved
draft-ietf-dprive-dnsoquic.md Outdated Show resolved Hide resolved
draft-ietf-dprive-dnsoquic.md Outdated Show resolved Hide resolved
draft-ietf-dprive-dnsoquic.md Show resolved Hide resolved
draft-ietf-dprive-dnsoquic.md Outdated Show resolved Hide resolved
draft-ietf-dprive-dnsoquic.md Outdated Show resolved Hide resolved
@saradickinson
Copy link
Collaborator

If we want to do a draft update to generate discussion in the WG, then I would be OK with merging the as long as we have OPEN QUESTIONS: on the following added to the text:

huitema and others added 11 commits August 5, 2021 10:11
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
Co-authored-by: Martin Thomson <mt@lowentropy.net>
@martinthomson
Copy link
Collaborator

I'm happy for some of my comments to be taken to other issues for follow-up, but there are still quite a few of them that are not resolved. Sara's questions are good ones (I think that the REFUSED + EDE(TOO_EARLY) thing belongs in the other draft, for instance) for taking to the list.

For transparency, and to add to Sara's list:

  • I still think that this restates TLS requirements too much in a few places.
  • The anti-replay recommendations are a little confused.
  • The specific time skew recommendation is poorly justified and should probably be removed.
  • Ticket issuance and reuse advice needs to be removed, unless specific, additional stipulations over RFC 8446 can be clearly identified.

@huitema
Copy link
Owner Author

huitema commented Aug 6, 2021

@martinthomson: the last commit tries to address your concerns. I tried removing some of the redundancy with TLS, replacing text by reference to section 8 and appendix C.4 of TLS 1.3. I also removed the time-skew recommendation from the implementation requirements, but added a discussion of its effects in the privacy considerations. Ticket issuance recommendation simplified to just "make the duration long enough".

I still need to work on the "replayable transactions" text.

@huitema
Copy link
Owner Author

huitema commented Aug 6, 2021

The "tighter definition" commit revises the anti-replay consideration, aligning with our emerging consensus that servers must either delay processing, refuse transactions with EDE "too early", or just close the connection. Not perfect, must most probably good enough to have a discussion with the WG. I added the reservation of the "too early" EDE. I also try to amend the text about freshness tests in the privacy considerations that @martinthomson described as too optimistic. At that point, I would like to check in this PR, so we can have a basis for new discussions.

@huitema
Copy link
Owner Author

huitema commented Aug 9, 2021

Yes, completed is sufficient. I was a bit too cautious.

I certainly agree that smaller delays are better. On the other hand, just saying small begs the question, how small? Small compared to what?

The cache attacks involve observing encrypted traffic, and then assessing the state of the cache after the client's message arrives to the recursive server. If the client request triggered a refresh of the cache, the attacker will find a TTL close to the maximum value minus the delay between assessment and replay. If the TTL is smaller than that, this means the value was probably in the cache before the replay, so the attacker cannot conclude that the target name was looked up. The generic cache attack does not require replaying 0-RTT data -- it also works against 1-RTT data.

The attackers can increase the efficiency of the generic cache attack by assessing the state of the cache before the client's message arrives. That's where 0-RTT replay is useful, because the attacker can choose the time at which it does the replay. Once the attacker know the TTL of the target record in the cache, it can wait until that TTL expires and then do the replay, and get an assessment that is much less ambiguous. But of course the freshness taste limits how much the attacker can delay the replay. That's what, in my mind, causes the link between the maximum tolerance of that test and the potency of the attack.

@martinthomson
Copy link
Collaborator

Yes, the question of "how small" is always "as small as possible", which is unsatisfactory.

I was thinking that if the attacker wanted to confirm the contents of a query, they would replay that query until the cache entry is evicted. If the original query caused the cache entry to be created, your logic holds, but that is not necessary to mount the attack in question.

The attacker only needs to watch for the edge caused by cache eviction. That is an increase in response time and maybe a request to the authoritative. They can then correlate that with the known cache expiry time of the record. As you say, the attacker can learn when the record expires through their own queries or by observing when the authoritative was queried; assuming that the client populated the cache is not helpful.

The true defense against this sort of thing is likely k-anonymity. If there could be k >>1 records that could have expired at that time, then the attacker only knows that one of those k records was queried for. That is hard to achieve if you have strong traffic analysis that supports linking a query from a stub to a query to an authoritative, especially when an authoritative server doesn't receive much traffic.

@huitema
Copy link
Owner Author

huitema commented Aug 9, 2021

What is the limit to "as small as possible"? Clearly, that depends on how precise the clocks are, whether they may drift, etc. I know that in picotls, Kazuho set that to 10 seconds. Could it be smaller? 1 second? 100 milliseconds? Should we ask the TLS working group for guidance?

@huitema
Copy link
Owner Author

huitema commented Aug 9, 2021

The importance of the issue also depends on the scenario. Are we really concerned about that attack in the recursive to authoritative scenario? And then, if we are concerned mostly about recursive resolvers, are there specific defenses that these resolvers could implement, like your description of k-anonymity? Maybe something as simple as "if I have to refresh 1 cache entry, I will refresh K entries at the same time?" Not that I would want to specify that now, but we could have some weasel text...

@huitema
Copy link
Owner Author

huitema commented Aug 10, 2021

OK, I thought it over, and I think we have a solution for the 0-RTT issue. Basically, align with section 8 of TLS 1.3. The issue of course is that the recommended mechanisms in TLS 1.3 require shared state between all servers in a "system". In general, that's hard. But in the case of DNS over QUIC, the system is defined by "all servers that share the same DNS cache", and that's much more palatable. For example, if all servers that share the same cache can also cache the identifiers of 0-RTT requests received for the duration of the freshness checks, then we have an effective mitigation against replay attack.

@huitema
Copy link
Owner Author

huitema commented Aug 12, 2021

Merging this after private message from Sara. Adding issue #66 to flag remaining work required for specializing the privacy considerations based on roles. This PR primarily addresses issue #54 and partially addresses issue #55.

@huitema huitema merged commit 497cf57 into master Aug 12, 2021
@huitema huitema deleted the more-robust-0rtt branch August 12, 2021 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants