Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce a version alias mechanism #2573

Open
wants to merge 8 commits into
base: master
from

Conversation

@marten-seemann
Copy link
Contributor

commented Apr 1, 2019

This is an attempt to solve the QUIC version ossification that was discussed in Prague. Fixes #2496.

Servers can announce a list of version aliases in the transport parameters. A version alias can be any valid QUIC version number, and the server guarantees to accept this version number as an alias for the currently used version. Each version alias comes with a lifetime for which it is valid, as well as a salt for the encryption of the Initial.
On subsequent connections, clients can use a version alias to establish a connection to the same server.

If widely deployed, middleboxes will get used to version numbers from the whole version number space being used for QUIC connections. Since they are unaware of the Initial salt being used for alias versions, they can't even decrypt the Initial packet.

In the current form, this PR lacks some text about the privacy implications of this proposal. A version alias is (yet another) cookie, so it should have the same properties as a token, i.e. a client shouldn't use the same version alias more than once, in order to avoid being identifiable to on-path observers.

@marten-seemann marten-seemann changed the title introduce a version alias transport parameter introduce a version alias mechanism Apr 1, 2019

@martinduke
Copy link
Contributor

left a comment

This is a great technical solution, far better than one I hoped to achieve when I launched the issue. I strongly believe we should move forward with this and hope that enough major players will support it to make it viable.

There is a bit of an unfortunate corner case: a v1-only server might pick an alias that matches v2 or some experimental version in the internet. A client that asks for the non-alias version to that server will send an Initial that the server can't decrypt, which will result in connection failure rather than Version Negotiation. I think we should proceed even if we can't mitigate this.

~~~
struct {
uint32 VersionNumber;
varint Lifetime;

This comment has been minimized.

Copy link
@martinduke

martinduke Apr 1, 2019

Contributor

I think we should put some sort of limit on the Lifetime. A server accidentally configured to send out very long lifetimes may create long-term problems for itself. Far better for a client to reject it as invalid! I suggest something on the order of days -- certainly no more than 10^6 seconds.

This comment has been minimized.

Copy link
@mikkelfj

mikkelfj Apr 1, 2019

Contributor

Also, while seconds are fine, do we have seconds as a unit anywhere else - otherwise milliseconds may be a better unit.

This comment has been minimized.

Copy link
@philsbln

philsbln Apr 1, 2019

Making the Lifetime a fixed uint32 in milliseconds would limit the alias to roughly two month. Could be a practical solution as the variant encoding does not really save space for lifetimes in the range of days.

This comment has been minimized.

Copy link
@janaiyengar

janaiyengar Apr 9, 2019

Contributor

I like @philsbln 's suggestion of using uint32 here.

version. Servers SHOULD send at least one version alias, and SHOULD frequently
change the value that they announce. Each version alias contains a lifetime,
which indicates how long the server will accept this version alias. It also
contains an initial salt, which is used instead of the initial salt as defined

This comment has been minimized.

Copy link
@martinduke

martinduke Apr 1, 2019

Contributor

We should think hard about using a special salt vs. the v1 salt. Marten has, almost incidentally, engineered a weak form of SNI encryption: if you really want to decode the initial, you can connect to the server yourself and likely get the salt. But a typical firewall won't be able to read the SNI anymore. This is a victory for the end-to-end principle, but perhaps a Pyrrhic one if firewalls just give up and drop QUIC. I don't have a particularly strong opinion on how to proceed, but want the WG to make this decision deliberately.

This comment has been minimized.

Copy link
@philsbln

philsbln Apr 1, 2019

Even without the salt, the presence of multiple QUIC versions with different salts would force a middle-box to opportunistically decrypt initials using different salts. So if we decide QUIC being blocked because of too much obfuscation is a consideration, this proposal adds to this level even without a salt.

@mikkelfj

This comment has been minimized.

Copy link
Contributor

commented Apr 1, 2019

Interesting, but middleboxes can scan for known unsalted versions and block some of these. This effectively stops these versions because salting isn’t reliable. Furthermore a middlebox can drop any initial packet it doesn’t understand to force unsalted version on retry.

It can still work for less aggressive middleboxes, but perhaps it is better to wait for the fully encrypted solution Kazuho and Huitema has been working on.

@kazuho

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

In addition to what @mikkelfj points out, a middlebox can inject a VN and force the client to retransmit the Initial packet in the version the middlebox prefers. Note that such an attack would not be noticed, because we are removing downgrade prevention from the VN design.

I think that the proposed approach works as an anti-ossification mechanism, but I do not think it's anything more than that.

@marten-seemann

This comment has been minimized.

Copy link
Contributor Author

commented Apr 1, 2019

In addition to what @mikkelfj points out, a middlebox can inject a VN and force the client to retransmit the Initial packet in the version the middlebox prefers. Note that such an attack would not be noticed, because we are removing downgrade prevention from the VN design.

That's correct, until we define a QUIC version that has downgrade protection (assuming that this version contains the version alias mechanism proposed here). It seems to be the desire of many people I've spoken to quickly define a QUIC v2 that's identical to QUIC v1, with version negotiation being the only addition to the protocol. In that case, this attack would become infeasible as soon as v2 is deployed, because middleboxes have no way of distinguishing between a v1 and a v2 alias version.

@kazuho

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Then, I might prefer calling this a v2 issue.

If we are to start running v2 as soon as (or even before) v1 gets finalized, possibly with an anti-ossification scheme like this (that comes along with downgrade protection), then I do not think we need to discuss inclusion of this PR in v1?

@marten-seemann

This comment has been minimized.

Copy link
Contributor Author

commented Apr 1, 2019

If we are to start running v2 as soon as (or even before) v1 gets finalized, possibly with an anti-ossification scheme like this (that comes along with downgrade protection), then I do not think we need to discuss inclusion of this PR in v1?

I disagree with that. Just because this mechanism isn't perfect, doesn't mean that it's not valuable in preventing ossification. And, as @mikkelfj pointed out, middleboxes can always just drop packets they don't like, so no mechanism will ever be perfect.
Preventing ossification of the version number is valuable no matter if v1 or v1 and v2 are rolled out at the same time. In addition, should designing v2 take longer than we hope now, this mechanism will hopefully get middleboxes used to the fact that valid QUIC traffic actually uses the whole range of version numbers.

servers announce a list of version numbers that they interpret as an alias for
the version number used in this draft. Alias versions MUST NOT be a reserved
version. Servers SHOULD send at least one version alias, and SHOULD frequently
change the value that they announce. Each version alias contains a lifetime,

This comment has been minimized.

Copy link
@mikkelfj

mikkelfj Apr 1, 2019

Contributor

If we have this, is there any point to the concept of reserved versions at all? Perhaps this depends on future version negotiation.

This comment has been minimized.

Copy link
@MikeBishop

MikeBishop Apr 18, 2019

Contributor

This feature is being proposed to grease middleboxes; reserved versions exist to grease the peer endpoint.

~~~
struct {
uint32 VersionNumber;
varint Lifetime;

This comment has been minimized.

Copy link
@mikkelfj

mikkelfj Apr 1, 2019

Contributor

Also, while seconds are fine, do we have seconds as a unit anywhere else - otherwise milliseconds may be a better unit.

@martinthomson

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

@marten-seemann can you open an issue to track this?

~~~
struct {
uint32 VersionNumber;
varint Lifetime;

This comment has been minimized.

Copy link
@dtikhonov

dtikhonov Apr 1, 2019

Contributor

varint is not part of the TLS Presentation Language.

This comment has been minimized.

Copy link
@marten-seemann

marten-seemann Apr 2, 2019

Author Contributor

Right. I was hoping for some help from people more familiar with the TLS presentation language on how to write it. We're using varints for all the other transport parameters, but we successfully managed to never actually use presentation language.

@martinduke

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

@martinthomson I believe this targets #2496.

@martinthomson
Copy link
Member

left a comment

As noted on the list, a few major items:

  • This might be better as a NEW_TOKEN addition.
  • This needs more thorough treatment of linkability.
  • Some discussion of how to generate these values would be advisable. That doesn't need to include the tricky bit, which is how you manage this in anticipation of the deployment of a new version, but it does need to have at least an example of how to do this without messing it up.
struct {
uint32 VersionNumber;
varint Lifetime;
opaque InitialSecret<20>;

This comment has been minimized.

Copy link
@martinthomson

martinthomson Apr 9, 2019

Member

You want opaque InitialSalt[20]; or opaque InitialSalt<20..32>; with appropriate values for minimum and maximum.

Suggested change
opaque InitialSecret<20>;
opaque initial_secret[20];
Show resolved Hide resolved draft-ietf-quic-transport.md Outdated
make the lifetime a uint32
Co-Authored-By: marten-seemann <martenseemann@gmail.com>
Show resolved Hide resolved draft-ietf-quic-transport.md Outdated
change the lifetime to milliseconds
Co-Authored-By: marten-seemann <martenseemann@gmail.com>
@philsbln
Copy link

left a comment

I just realised that requiring the server to announce (some of the) version aliases over the whole lifetime may give us downgrade protection for free. That also allows us to keep version alias and version downgrade protection separate.


Clients SHOULD remember the aliases and use it for subsequent connections to the
same server in the future. This applies to both 0-RTT connection as well as
connections that don't use 0-RTT.

This comment has been minimized.

Copy link
@philsbln

philsbln Apr 9, 2019

We should add a note that Version Aliases must not be advertised during version negotiation

Suggested change
Version aliases MUST NOT be advertised in version negotiation Packets to avoid conflicts with future versions and experiments.
currently used versions. This transport parameter is only sent by the server.
Every version alias contains a lifetime in milliseconds. The alias is only valid
for that lifetime, clients MUST NOT use it after expiry.

This comment has been minimized.

Copy link
@philsbln

philsbln Apr 9, 2019

Should we require the server to announce (some) of the version aliases during their whole lifetime to enable the client to detect downgrade attacks?

This comment has been minimized.

Copy link
@martinthomson

martinthomson Apr 10, 2019

Member

If the idea that this lifetime is also a commitment to support a given version, then this isn't really a fully effective downgrade-prevention mechanism. I expect that we'll see something good on that shortly.

I would instead prefer to say that the server is required to honour the alias for the entire advertised lifetime unless it also stops supporting that QUIC version.

This comment has been minimized.

Copy link
@philsbln

philsbln Apr 10, 2019

I don't consider this a fully effective downgrade-prevention mechanism, but it provides a mechanism for the client to detect on-path version tapering: If the client gets a version negotiation packet in return to an alias version and receives the same version alias that failed again, it can be sure someone is doing a downgrade attack (or the server is horribly broken).

@mikkelfj

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2019

I'm a bit concerned with unbounded amounts of aliases being dumped on the client. If the alias is tied to a token, a client can just drop excess tokens and aliases, but if the design somehow evolves into a many headed alias thing, there could be some abuse/resource issues.

Wouldn't it be reasonable to limit this to a single alias?

A future version linked to downgrade protection and version negotiation could have a more complicated pattern, with more time for analysis and design.

@philsbln

This comment has been minimized.

Copy link

commented Apr 10, 2019

I consider the whole version alias mechanism must be strictly optional for client and server. If it is used, it may provide an early signal for version negotiation tapering.
The server MUST NOT assume that the client uses any of the version aliases, but the server MUST accept all version aliases it presented the client with during their lifetime unless it also stops supporting the aliased QUIC version (thanks @martinthomson ).

@kazuho

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

Asking mostly out of curiosity: do we need to scramble (or protect) other fields of a long header packet as well?

The (original) argument for having a version alias is to protect against accidental ossification of the protocol. That could happen on the other fields of v1 packet as well; namely the non-invariant fields of the first octet and the CIDL fields. If we scramble CIDL fields, it would not matter if the rest of the fields (e.g., token length) were kept as-is, assuming that the CID lengths have enough randomness to randomize the position of such fields.

Though the downside would be that servers would no longer be able to use the token to decipher the actual value of version number field.

@mikkelfj

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

That could be problematic for load balancers. But if they don’t know the version it might be bad anyway?

@marten-seemann

This comment has been minimized.

Copy link
Contributor Author

commented Apr 12, 2019

That could happen on the other fields of v1 packet as well; namely the non-invariant fields of the first octet and the CIDL fields. If we scramble CIDL fields, it would not matter if the rest of the fields (e.g., token length) were kept as-is, assuming that the CID lengths have enough randomness to randomize the position of such fields.

The connection ID length fields as well as the connection IDs themselves are part of the invariants.

@kazuho

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@marten-seemann Maybe I should have been more specific. The remaining vectors that are prone to ossification are:

  • non-invariant bits of the first octet
  • the restrictions on how CIDL is used in v1
  • the Token Length field (the position is dependent on the values of CIDL fields)
  • the Length field (the position is dependent on the values of CIDL fields and the Token Length field)

For example, I would not be surprised to see a middlebox checking that the Token Length field contains a reasonable value (when viewed as a v1 packet) regardless of the value of the version number field, in an attempt to drop suspicious QUIC packets.

@nibanks

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

I do fear that this feature will make a generic, independent DDoS solution practically impossible. The more I talk to folks on the DDoS side, the more push back I get from any type of coordination between backend servers and the DDoS device. If there is no coordination, then the device will not understand these aliased version numbers; and therefore will not be able to reply with a version specific response (Retry).

The only course of action the device will have is to send back a VN packet with a reserved field, and drop the incoming packets for new connections. This is definitely not ideal, as I have no idea how clients would use this for back off logic. I'd assume they'd immediately fallback to H2, but when would they try QUIC again?

@marten-seemann

This comment has been minimized.

Copy link
Contributor Author

commented Apr 12, 2019

I do fear that this feature will make a generic, independent DDoS solution practically impossible. The more I talk to folks on the DDoS side, the more push back I get from any type of coordination between backend servers and the DDoS device. If there is no coordination, then the device will not understand these aliased version numbers; and therefore will not be able to reply with a version specific response (Retry).

We had this discussion on the NY interim, and we concluded that it's a design feature of QUIC that middleboxes can't send retries unless they coordinate with the server. That's why we introduced the original_connection_id transport parameter, in which the server has to prove knowledge of the connection ID that the client initially used for its connection attempt. If a middlebox wants to send a Retry, it therefore must either communicate this connection ID directly to the server, or (which is the more practical solution) share a key with the server and encode it into the token.

To decode the version number of greased versions, it's not hard to imagine a similar arrangement between the middlebox and the server. Defining an algorithm to do so safely would probably belong in the LB draft.

@nibanks

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@marten-seemann I agree that it's possible to coordinate these things, but the DDoS device is meant to be a light-wieght plug-in-play device into the path that can be extremely efficient in getting rid of unwanted traffic, all in hardware. The folks in Azure that own the functionality are extremely hesitant to add any additional complexity there that adds additional points of failure. Any kind of protocol between the backend servers and that box increases the complexity of such a device by an order of magnitude. It will take a lot of time to get such a complete solution in place.

In the mean time, we have to continue to have a solution, beyond the default UDP rate limitting algorithms that exist today. We were considering a simple connection close with server busy error code, but that's not possible with this design. So, the only invariant solution would be a practically empty VN response.

@martinthomson

This comment has been minimized.

Copy link
Member

commented Apr 13, 2019

A server that uses a load balancer can either not use this mechanism (i.e., it's free) or provide whatever aliasing the load balancer supports.

The point is to have the feature so that some servers support it. Not all of them have to.

@martinduke

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

I am not sure why this affects load balancing at all. The CID is part of the invariants, so LBs should be able to read and parse the CID without regard for the other versions.

There is a legitimate coordination problem for DDoS boxes, I agree. However, I'm pretty comfortable for this kind of mechanism not working in cloud deployments -- we're just looking for some big players and the browsers to support this. There's basically no benefit for endpoints except preserving extensibility of the protocol.

@MikeBishop
Copy link
Contributor

left a comment

Needs some of the extra text others have noted, plus some agreement issues.

While we will eventually need to write a full VN specification, it might also be worth mentioning some implications here. While we're not specifying the generation algorithm, let's be pedantic and say that the server MUST NOT advertise as an alias a version number for some other version it actually speaks. As a corollary, if the server advertises as an alias some version number that the client actually supports, the client SHOULD assume the server doesn't support that version.

in section 5.2 of {{QUIC-TLS}}. The list of version aliases is sent in the
server's Transport Parameters (see {{transport-parameter-definitions}}).

Clients SHOULD remember the aliases and use it for subsequent connections to the

This comment has been minimized.

Copy link
@MikeBishop

MikeBishop Apr 18, 2019

Contributor

Antecedent agreement. Many possibilities:

  • ...remember the aliases and use them
  • ...remember the list of aliases and use it
  • ...remember one of the aliases and use it
version_aliases (0x000e):

: A list of version numbers that the server accepts as an alias for the
currently used versions. This transport parameter is only sent by the server.

This comment has been minimized.

Copy link
@MikeBishop

MikeBishop Apr 18, 2019

Contributor
Suggested change
currently used versions. This transport parameter is only sent by the server.
currently used version. This transport parameter is only sent by the server.
@@ -4115,6 +4130,25 @@ preferred_address (0x000d):
~~~
{: #fig-preferred-address title="Preferred Address format"}

version_aliases (0x000e):

: A list of version numbers that the server accepts as an alias for the

This comment has been minimized.

Copy link
@MikeBishop

MikeBishop Apr 18, 2019

Contributor
Suggested change
: A list of version numbers that the server accepts as an alias for the
: A list of version numbers that the server accepts as aliases for the
: A list of version numbers that the server accepts as an alias for the
currently used versions. This transport parameter is only sent by the server.
Every version alias contains a lifetime in milliseconds. The alias is only valid
for that lifetime, clients MUST NOT use it after expiry.

This comment has been minimized.

Copy link
@MikeBishop

MikeBishop Apr 18, 2019

Contributor
Suggested change
for that lifetime, clients MUST NOT use it after expiry.
for that lifetime. Clients MUST NOT use an expired alias.

Comma splice.

## Version Aliases

In order to avoid ossification of the version number defined by this draft,
servers announce a list of version numbers that they interpret as an alias for

This comment has been minimized.

Copy link
@MikeBishop

MikeBishop Apr 18, 2019

Contributor
Suggested change
servers announce a list of version numbers that they interpret as an alias for
servers announce a list of version numbers that they interpret as aliases for
servers announce a list of version numbers that they interpret as an alias for
the version number used in this draft. Alias versions MUST NOT be a reserved
version. Servers SHOULD send at least one version alias, and SHOULD frequently
change the value that they announce. Each version alias contains a lifetime,

This comment has been minimized.

Copy link
@MikeBishop

MikeBishop Apr 18, 2019

Contributor

This feature is being proposed to grease middleboxes; reserved versions exist to grease the peer endpoint.

MikeBishop and others added some commits Apr 18, 2019

@marten-seemann marten-seemann force-pushed the marten-seemann:version-aliases branch from 1eeb14e to d887ca7 Apr 18, 2019

@marten-seemann

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2019

Thanks for the suggestions, @MikeBishop. I applied all of them.

@mikkelfj

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

Version aliases can be used to track connections that are frequent visitors.

@mikkelfj

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

I am not sure why this affects load balancing at all. The CID is part of the invariants, so LBs should be able to read and parse the CID without regard for the other versions.

If the pool of servers handling one version is different from another pool, for example in a blue/green deployment. Since client decides ODCID, you have nothing but the version to route on. You could, however, use a retry to get a new DCID so traffic will route correctly.

EDIT: this assumes that Retry works across versions - do we have anything that prevents or encourages that?

@marten-seemann

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

If the pool of servers handling one version is different from another pool, for example in a blue/green deployment. Since client decides ODCID, you have nothing but the version to route on. You could, however, use a retry to get a new DCID so traffic will route correctly.

We already have text that you shouldn't use the ODCID for routing anyway. This applies especially in a setup like this.

@mikkelfj

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

If you can recognize the ODCID as being a Retry DCID you could.
For a truly original ODCID you shouldn't, I agree.
But it is not so much about blame, as it is about what LB can do and need to do.

@marten-seemann

This comment has been minimized.

Copy link
Contributor Author

commented Apr 19, 2019

If you can recognize the ODCID as being a Retry DCID you could.

If it's a Retry DCID, it's by definition not an ODCID any more (the O means "original").

@mikkelfj

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

Yes, but it is the DCID in the initial packet. A load balancer would not trivially be able to tell the difference between the DCID of two initial packets since there is no RetryInitial packet type.

@mikkelfj

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

Or rather, there is, since the token length field would be non-zero (EDIT: no, because LB can't read it)

But what are we arguing about here? I'm just saying a Retry could be used to route to different version, but only if Retry is able to work across those versions.

version. A server MUST NOT advertise an alias version number for a version that
it actually supports. If the server advertises an alias version number that the
client actually supports, the client MUST assume the server doesn't support
that version and ignore the alias.

This comment has been minimized.

Copy link
@MikeBishop

MikeBishop Apr 19, 2019

Contributor

I don't know that the client has to ignore the alias, per se -- it just needs to be cognizant of what the interpretation of that version will be on that server. On the other hand, if there's a possibility that the server might acquire support for that version between flights (server upgrade case), perhaps it's better that the client just avoids it entirely after all.

@MikeBishop

This comment has been minimized.

Copy link
Contributor

commented Apr 19, 2019

If the pool of servers handling one version is different from another pool, for example in a blue/green deployment. Since client decides ODCID, you have nothing but the version to route on. You could, however, use a retry to get a new DCID so traffic will route correctly.

We already have text that you shouldn't use the ODCID for routing anyway. This applies especially in a setup like this.

Per #2026, we don't have such text, actually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.