Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateless reset fixes #1346

Merged
merged 12 commits into from
May 24, 2018
Merged

Stateless reset fixes #1346

merged 12 commits into from
May 24, 2018

Conversation

martinthomson
Copy link
Member

Inspired by @martinduke's #1328, I set to cleaning a few things up. You can see separate commits here that:

  • Take Martin's simplification suggestions
  • Reword a few things based on review
  • Make stateless reset symmetric
  • Add a paragraph limiting the size of stateless reset (for amplification attack reasons)
  • Add a point that @kazuho and I discussed somewhere about the scope of the stateless reset key and connection IDs (which opens a different DoS vector if they don't line up correctly). That's partly addressed by the inclusion of an instance identifier in the calculation of the token, but the point is that you probably want instance X to be able to reset connections for instance Y if Y is out of commission, so you end up with the potential for DoS if you aren't careful to ensure that X and Y use different connection IDs.

Closes #466, #1328.

martinduke and others added 7 commits May 10, 2018 14:02
The Stateless Reset is just a bunch of random bytes except the first one and the token, so I simplified the spec to reflect that.
Improved logical ordering of paragraphs.
Copy link
Contributor

@martinduke martinduke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to send a stateless reset in response to a long header? This would not happen very often, but if so we probably ought to send SR with a Long Header, and specify that somewhere in here.

Otherwise, this looks good to me.


This design relies on the peer always sending a connection ID in its packets so
that the endpoint can use the connection ID from a packet to reset the
connection. An endpoint that uses this design cannot allow its peers to use a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... cannot allow its peers to send packets with a zero-length destination connection ID"

@martinthomson
Copy link
Member Author

Thanks for the quick review.

Nothing specifically prevents an endpoints from sending stateless reset in response to a long header. I've opened #1348 in case I missed something, because I have to move on to something else now :(.

Copy link
Contributor

@martinduke martinduke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit.

connection. An endpoint that uses this design cannot allow its peers to use a
zero-length connection ID.
connection. An endpoint that uses this design cannot allow its peers to send
packets with a zero-length connection ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...zero-length destination connection ID"

as long as the key is valid. If instances that share a stateless reset key
allow connections with the same connection ID to be created, then the stateless
reset token for one connection could be used to terminate any connection that
has the same connection ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like you're trying to close #1258 and #1259, too. However, I don't think this language is strong enough. It's not just that it MUST allocate from the same space without reusing; it MUST NOT generate a stateless reset for a connection ID that it would not have allocated, even if it has the proper key to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the "instance identifier" discussed above would mitigate this. It will generate an SR, but it won't be valid from a different instance.

@martinthomson
Copy link
Member Author

Notes from editor's meeting: the last paragraph can be removed, then we will add another PR to address the routing issue underlying all this..

Copy link
Contributor

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a few comments.

The message consists of a header octet, followed by random octets of arbitrary
length, followed by a Stateless Reset Token.

The endpoint SHOULD send a packet with a short header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a strange SHOULD. The Stateless Reset packet is defined here with a short header. How can this be a long header?


The endpoint SHOULD send a packet with a short header.

Assuming a short header, the Random Octets field needs to include at least 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above comment holds, remove "Assuming a short header"

header, therefore it cannot set the Destination Connection ID in the stateless
reset packet. The destination connection ID will therefore differ from the
value used in previous packets. A random Destination Connection ID makes the
connection ID appear to be the result of moving to new connection ID that was
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"moving to new connection ID" -> "moving to a new connection ID"

reset packet. The destination connection ID will therefore differ from the
value used in previous packets. A random Destination Connection ID makes the
connection ID appear to be the result of moving to new connection ID that was
provided using the NEW_CONNECTION_ID frame ({{frame-new-connection-id}}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the/a/

An endpoint detects a potential stateless reset when a packet with a short
header either cannot be decrypted or is marked as a duplicate packet. The
endpoint then compares the last 16 octets of the packet with the Stateless Reset
Token provided by its peer, either from the NEW_CONNECTION_ID frame or the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/from/in/
s/the NEW_CONNECTION_ID frame/a NEW_CONNECTION_ID frame/
s/server/server's/

use HMAC {{?RFC2104}} (for example, HMAC(static_key, server_id ||
that takes three inputs: the static key, the connection ID chosen by the
endpoint (see {{connection-id}}), and an instance identifier. An endpoint could
use HMAC {{?RFC2104}} (for example, HMAC(static_key, instance_id ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, incorporating an instance_id doesn't work for connections that are diverted to a new instance, which is a common case, possibly even the most common case. Server restarts are unlikely to be a common use case, since usually servers are drained before restarting, and server crashes are not frequent. Short-term routing flaps however are common. I think the text should say that for Stateless Reset to work across instances, the HMAC could exclude the instance_id. (Correct me if my understanding is wrong.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed that below. But I've expanded on it a little in my latest changes. The trick here is that you can recover the instance ID using the connection ID. I will follow up with a change that includes the fixes we discussed. Part of that change involves removing the instance ID from this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And of course, the issue is that if you enable servers to SR after a routing flap, you also enable those servers to be used as oracles. The only defense is to make it impossible for the routing to be affected by the attacker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinthomson : Ah, right, I see it's addressed below. Given this is an example construction, I would suggest making that explicitly clear up at the beginning of this paragraph.
@MikeBishop : Yes, that is true, but defending the routing infrastructure against attacks seems out of scope for us, since similar attacks can be launched against TCP as well, where you'd receive an RST or ICMP unreachable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defending it is out of scope. Noting that you create an attack vector if you fail to defend it yourself, however, bears mentioning.

Copy link
Contributor

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo my earlier (take-it-or-leave-it) comment.

@martinthomson martinthomson merged commit f43b682 into master May 24, 2018
@martinthomson martinthomson deleted the stateless-reset-fixes branch May 24, 2018 04:52
@martinthomson
Copy link
Member Author

See #1386 for the follow-up.

@Pascalh2001
Copy link

Pascalh2001 commented May 28, 2018

Hi,
About Stateless Reset, on one side you have :

A single static key can be used across all connections to the same endpoint by
generating the proof using a second iteration of a preimage-resistant function
that takes three inputs: the static key, the connection ID chosen by the
endpoint (see {{connection-id}}), and an instance identifier.

But on the other side (section about Connection IDs) :

Short headers only include the Destination Connection ID and omit the explicit
length. The length of the Destination Connection ID field is expected to be
known to endpoints.

So if the server looses state, how do you retrieve the Destination Connection Id from a short header ?
Is it mandatory for an endpoint to use the same length for all CIDs it uses and to be able to remind this length after a hard reboot ?

('ve been reading your work and discussions since recently, great project!)

@mikkelfj
Copy link
Contributor

@Pascalh2001 you could imagine one server cluster settling on 16-bit identifiers throughout, and another cluster that uses a 4-bit type prefix from which the length can be derived, and a third cluster that register connection ID's in an in-memory database.

You save bandwidth by leaving it up to the endpoint rather than forcing the length to be transported at all times.

@MikeBishop
Copy link
Contributor

A cluster that wants the length present simply includes the length of the CID as part of the CID it tells the client to send it. The encoding inside is totally up to the server, so that might be the explicit length, or a four-bit encoding of it, or even a single bit that toggles between the two lengths the cluster typically uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants