Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When should server-chosen connection IDs be sent and how are they indicated? #349

Closed
RyanTheOptimist opened this issue Feb 28, 2017 · 8 comments
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@RyanTheOptimist
Copy link
Contributor

My sense is that we'd like the server to chose the "real" connection ID to be used for a connection. Among other things, this enables servers to embed routing/load balancing information into connection IDs.

I have heard two options for what the client should do before it has a server-chosen connection ID.

  1. Send a random connection ID
  2. Send a connection ID of 0

If we go with approach 1, then we need some mechanism that load balancers can use to detect if a packet's connection ID is server-chosen or not. It seems likely that all packets encrypted with 1-RTT keys could use server generated connection IDs. But other packets it's not so clear. Consider the case of a non-0-RTT handshake. The initial client handshake packet would include a client-chosen connection ID. When the server replies, it could use a server-chosen connection ID. If so then when the client sends the next handshake packet it could use the server-chosen connection ID as well. But that means that we'd need a mechanism to disambiguate client handshake packets with client-chosen connection IDs from client handshake packets with server-chosen connection IDs. We could devote bits or packet types to indicate this. I suspect the same argument might hold for server handshake packets. We might not want the server to chose a connection ID until it is ready to accept the handshake.

One benefit of this approach, however, is that a QUIC server could chose to simply use the client's connection ID for the duration of the connection. This would mean that it would not need to be able to generate a new connection ID that is guaranteed to be sent by the load balancer back to itself. In cases where the vendor of the load balancer and the server are not the same, this might be valuable.

If we go with approach 2, then it's obvious that any packet with a non-zero connection ID has a server-chosen connection ID (though the server still may want to verify that it's a "valid" connection ID according to the algorithm it uses).

On the other hand, this requires QUIC servers which are behind load balancers to be able to generate connection IDs which the load balancer will route back to them. If the server and load balancers are from different vendors, this might be challenging.

@igorlord
Copy link
Contributor

igorlord commented Feb 28, 2017

We might not want the server to chose a connection ID until it is ready to accept the handshake.

Or maybe the opposite -- the server can choose to assign a connection ID to this connection at the first opportunity, even if it is not yet ready to accept the connection.

An example of this is version negotiation. During new software rollout, backend servers may be running different versions of the software and hence supporting different QUIC versions. When the server is doing version negotiation, it may be very interested in ensuring that the client comes back to this backend server using the QUIC version it requested. If ConnectionID is used by the load balancers to route to backend servers, the server will likely want to generate a new ConnectionID during version negotiation and before it is ready to accept the handshake.

Of course, there must be a way to look at a packet and know that it is using Server's Connection ID.

@martinthomson
Copy link
Member

If we go with approach 1, then we need some mechanism that load balancers can use to detect if a packet's connection ID is server-chosen or not.

@RyanatGoogle, (or @igorlord) can you justify the "need" part of this? You both make the assertion, but don't justify it and I can't work out why you have this need. Are you concerned that a stateless server will either be unable to pick a new value because it isn't sure if a new value was already chosen by another instance of itself?

I think that you could make a good argument for the server being able to respond with a different connection ID on every cleartext packet it sends. As long as this resolves by the time the client starts sending short packets, then multiple changes aren't a problem for the client. That requires only two things. Firstly, the different connection IDs used by the "server" all route to the same place for anything that depends on server-local state, i.e., only stateless reject or version negotiation can be used to change routing. Finally, the client can't move to a different source address during the handshake. I think those are reasonable constraints.

One advantage of a random value is that it might be good enough and it keeps the load balancing infrastructure quite simple. Maybe this isn't the perfect strategy, but a server could - as Google's deployment currently does in all but a few cases - let the client pick the route at random. They could stick with that unless it hurt them, and then they could trigger a stateless reject (or version negotiation if that is possible) to move any surplus clients around.

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Mar 2, 2017
@MikeBishop
Copy link
Contributor

First, we can't necessarily assume that 1-RTT packets will have a server-chosen connection ID. This is true with TLS, but I think someone mentioned at the interim using a different handshake protocol that relies on a database of pre-shared keys. (I presume the client's handshake is "We're going to use key 23b4 in table 3c. Deal." or something of that variety.) In that situation, 1-RTT packets may be sent before you've had a round-trip completed. (That was the basis for saying that the cutoff for Version bit would be both 1-RTT keys available and version negotiation complete, because the order of those is not guaranteed.)

Second, I intuitively agree that load balancers "need" to be able to differentiate. Let me see if I can unpack that. There are several types of packets that might arrive, with different treatment by load balancers:

  1. Packet whose connection ID says it's bound for a server behind the load balancer. Great!
  2. Packet whose connection ID says it's bound for a server that's no longer behind the load balancer. Hm. Oops. Public Reset? Route to random server?
  3. Packet whose connection ID doesn't decode properly. Probably client-selected; route to random server.

Understanding whether it's a client-selected Connection ID would help differentiate (3) from random junk. It might also help differentiate (2) and (3) if the client happens to randomly select a value in the right range.

I'm having trouble being more concrete about the requirement than that, and maybe that means the requirement doesn't actually exist, and it's just a convincing mirage.

@martinthomson
Copy link
Member

@MikeBishop, it's true that you might be protecting with 1-RTT keys (damn I hate that we have to accommodate models that don't really exist, sorry Jana, but it's really hard to internalize something like this), but you do have to send the long header because you need to confirm that you picked the right version still. If you can start sending short headers straight away, then you lose version negotiation too (FWIW, if you are agreeing on keys through signaling, why not also agree on version and connection ID).

I like that characterization Mike, a mirage is good. It seemed "obvious" to me at first too, but I just can't see how it would hurt. If the load balancer is stateless, then I think that what you have described is fine. The best I've been able to come up with is that clients can then drive traffic toward a specific backend. I believe that to be possible as long as you only route on connection ID though; given the size of the space, connection ID + source IP + source port might also have that property (that is, there isn't enough bits left over for a MAC that is strong enough to properly authenticate the connection ID).

@igorlord
Copy link
Contributor

igorlord commented Mar 3, 2017

This is close to how the load balancer would work (already works), with minor changes.

  1. Packet whose connection ID says it's bound for a server that's no longer behind the load balancer.
    This is actually unclear. If you mean:
    2.A. The server is no longer eligible to serve new connections, that's just like case 1 for existing connections.
    2.B. The server suddenly lost its state (went offline, rebooted, app restarted), then the load balancer may not even know this, but it is bad news for existing connections: blackhole, ICMP Host Unreach, or ICMP Port Unreach, or PubReset.

  2. Packet whose Connection ID doesn't decode properly.
    I would really like to avoid relying on random Connection ID not decoding to identify it as a "client-selected". FWIW, the load balancer has many servers behind it (possibly O(10^5) servers, for a Global-scale load balancer), so it may have to be very permissive for what qualifies as "decoding".

If the load balancer can tell with certainty that it is looking at a client-selected ID:

  1. If this 5-tuple (or ConnectionID, if it is not 0) is present in its local short-lived map, forward to the server identified by that map,

  2. Otherwise, choose a server for this connection using whatever policy, store this decision in the local short-lived map, and forward the packet to the chosen server.
    That short-lived map would only keep data for O(seconds) and would not be shared within the load balancer pool.

To further ensure robustness, packets with a client-selected ID would first be forwarded within a load balancer pool to a load balancer via a consistent hash (so they all land on the same lb machine). [I am omitting extra care needed when a new lb machine is added to / old lb machine deleted from the pool.]

@mirjak
Copy link
Contributor

mirjak commented Mar 3, 2017

This is actually related to #185 because if you know that it is the first packet, you know it a client-initiated ID.

@MikeBishop
Copy link
Contributor

It wasn't in Jana's list, but #361 fixes this, too, I think.

@martinthomson
Copy link
Member

Yes, #361 the all-destroyer. If anyone thinks that more needs to be done here (or they disagree with the design proposed by #361), please open another issue.

@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

6 participants