Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server's preferred address #1251

Merged
merged 9 commits into from
May 8, 2018
Merged

Server's preferred address #1251

merged 9 commits into from
May 8, 2018

Conversation

MikeBishop
Copy link
Contributor

@MikeBishop MikeBishop commented Mar 27, 2018

Fixes #560; this is a specific post-handshake handoff by the server to a new IP address. A more general ability to hand off to a different server address is left for the future, though it would probably take the form of a frame which is substantially similar to the new transport option.

One new requirement which isn't really specific to this but which this makes more obvious: If you're trying to do connection migration, but you get a Stateless Reset on the new path, that doesn't mean the connection gets dropped -- it just means that IP from your new address took you to the wrong server, and your new address can't be used to continue the connection.

Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though I'm still a bit confused on the definition of probing/non-probing packets.

{{migrate-validate}}, but the server MUST continue sending all other packets
from its original IP address.

Once the server has received a non-probing packet on its preferred address which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused on what probing and non-probing packets are. Did I just miss the definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 1612 defines the "probing frames" and talks about "packets containing other frames," but you're correct that there's not as clear a definition as I'd thought. Let's add that in the top-level migration PR. @janaiyengar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janaiyengar Already merged that, so I'd suggest we clarify it in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #1253

@ianswett
Copy link
Contributor

And thanks for writing this up Mike. This is a great traffic management feature, and it's great to have this in an early draft, so different groups can ensure it works as intended.

@MikeBishop MikeBishop changed the base branch from migration3 to master March 27, 2018 22:14
Copy link
Contributor

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-order comment: I think this should be called something else, since the parallel with Client Connection Migration might be misleading.

and connection ID to reach a different server instance which does not posses the
necessary connection state. Receiving a Stateless Reset in response to a probing
packet SHOULD NOT terminate the connection, but MUST cause the endpoint to
consider path validation to have failed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? The use case is that the server sends both a unicast address and a connection ID. In theory, the unicast address guarantees that the packet will reach the specific server farm that sent the redirect, and the connection ID guarantees that the load balancer will send the packet to the right server in the farm. Receiving a stateless reset is an indication that something is seriously wrong.

Or maybe you are not addressing this specific PR, but rather making a generic statement about probes. In which case I would suggest moving this to a different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's something I realized when discussing the simultaneous migration case. If the client probes both the original (anycast) and preferred (unicast) address from the new interface, the result is a race between a Stateless Reset from the anycast peer and a PATH_RESPONSE from the unicast peer. Since the anycast endpoint is likely to be closer, the Stateless Reset will likely win.

However, it's not specific to this PR, so it's now #1259.

A server initiates connection migration by including the preferred_address
transport parameter in the TLS handshake.

Once the handshake is finished, the client MUST initiate path validation (see
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring the client to use the server's preferred address seems unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is only effective if it can be relied upon. That said, it's a big deal, so we should talk about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server can't actually know if the client's probe fails to reach it for some reason, so it's going to have to tolerate clients failing to switch anyway. This could probably become a SHOULD without changing logic on anyone's side.

ID provided in the preferred_address transport parameter.

If path validation succeeds, the client SHOULD immediately begin sending all
non-probe packets to the new server address using the new connection ID and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/non-probe/non-probing/

{{migrate-validate}}, but the server MUST continue sending all other packets
from its original IP address.

Once the server has received a non-probing packet on its preferred address which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #1253


Once the server has received a non-probing packet on its preferred address which
is the largest packet number seen so far, the server begins sending to the
client exclusively from its preferred IP address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the server do if a subsequent packet from the client appears on the old address?

A client might need to perform a connection migration before the server's
connection migration has completed. In this case, the client SHOULD perform
path validation to both the original and preferred server address from the
client's new address concurrently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky. Where should the client send data if its old address is unusable?
(I want to spend more cycles thinking about this, but I'll be out. Do make sure that all corner cases are covered; I worry that there are more here.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it simple. This PR is about triggering a migration at the end of the handshake. Given the prevalence of NAT, the path challenge sent by the client to a new server address will very often arrive at the server with a different source address than the source address of the client initial packet, and there is not much the client can do about that. The default source address of the client might also change, but from a server point of view that's just the same as any client migration. So in most cases we will see "challenge from client, response and challenge from server, response from client." Both server and clients will have to consider whether sending data before the verification completes, but that's what they always do.

@@ -1791,6 +1811,64 @@ number. "packet_number_secret" is derived from the TLS key exchange,
as described in Section 5.6 of {{QUIC-TLS}}.


## Server Connection Migration {#migration-server}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be called Server Connection Migration. There's a symmetry with Client Connection Migration that this name suggests, but that's not true, since the assumptions are quite different. How about "Using a Preferred Server Address"?

Along the same lines, I think you should state any assumptions here clearly, since these have implications. I can think of at least one: The server is not actually changing network attachment points. I believe this assumption is implicit in this design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like (and prefer) the suggested name change.

When you say that not changing network attachment is a necessary assumption, why is that important? The new address will be validated and congestion control rebooted in the same way that any other migration would be. So even if this points to a completely different server instance (with keys being shipped across as needed), it still works.

Copy link
Contributor Author

@MikeBishop MikeBishop Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the necessary assumption is that the server maintains the ability to send/receive from both addresses through the whole process. So if the client's attempt to hand off fails, it can still continue with the original address. In the more general P2P case, the server might not actually continue to have access to the original address, because it has actually changed network connections.

@@ -1791,6 +1811,64 @@ number. "packet_number_secret" is derived from the TLS key exchange,
as described in Section 5.6 of {{QUIC-TLS}}.


## Server Connection Migration {#migration-server}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a short description of where this is expected to be used.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs working group review and feedback. Especially the part where support of this feature is mandatory. We need consensus that forcing clients to do this is a good thing.

(For the record, I don't think that the changes needed are big, but I'll request changes until it's clear that there is consensus to adopt the change.)

@@ -1105,6 +1107,14 @@ language from Section 3 of {{!I-D.ietf-tls-tls13}}.
};
TransportParameter parameters<22..2^16-1>;
} TransportParameters;

struct {
enum { IPv4(4), IPv6(6), (15)} ip_version;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent about snake_case and camelCase.

enum { IPv4(4), IPv6(6), (15)} ip_version;
opaque ip_address<4..2^8-1>;
uint16 port;
opaque connectionId<4..18>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this only works if the server chooses to use a connection ID.

and connection ID to reach a different server instance which does not posses the
necessary connection state. Receiving a Stateless Reset in response to a probing
packet SHOULD NOT terminate the connection, but MUST cause the endpoint to
consider path validation to have failed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this change in a separate PR?

### Responding to Connection Migration

A server might receive a packet addressed to its preferred IP address at any
time during the connection after the handshake is completed. If this packet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop "during the connection", it's redundant with "after the handshake"

A client might need to perform a connection migration before the server's
connection migration has completed. In this case, the client SHOULD perform
path validation to both the original and preferred server address from the
client's new address concurrently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is problematic. Managing two migrations concurrently is just hard. Though your stated design works (effectively trialing all three new paths), it's unnecessarily complicated.

I would revise the client migration and forbid it (or state that it isn't going to work) until the server has finished its migration. That is, the client has to wait until after the server's migration occurs. The odds that a client is forced to migrate in this period is low enough that starting over might be better than taking on the combinatorial mess. That way leads to ICE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there are two pieces. NAT rebinding means that, even if the client doesn't change its local address, the server will see a simultaneous migration. So forbidding it seems problematic, since the server can't enforce it. If it can't be totally forbidden, we could say that you SHOULD NOT attempt to perform a local migration until the handoff has completed.

I'm not convinced that abandoning one probe (because that local address is gone) and starting two in parallel is a combinatorial explosion -- it's two paths to probe, with an embedded preference between them if they both work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It increases the complexity of implementations. Right now, you probe one path, and if a new path presents itself, you abandon any previous path. Having to move from 1 to N (even if N == 2), is more complex than we need. The odds of needing to move just after the handshake are low enough that I would prefer to just say that the client can hold off on migration until after the server completes. I guess that your response is that this only affects discretionary migrations. But we can at least write that consideration down.

A server initiates connection migration by including the preferred_address
transport parameter in the TLS handshake.

Once the handshake is finished, the client MUST initiate path validation (see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential problem here is that the client concludes that the handshake is finished before the server does. That leads to retransmissions of the client Finished getting confused. I can't see any concrete issues (other than the ones we already have), but you should convince yourself that it's OK.

address, the server MUST protect against potential attacks as described in
{{address-spoofing}} and {{on-path-spoofing}}. In addition to intentional
simultaneous migration, this might also occur because the client's access
network used a different NAT binding for the server's preferred address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be clearer: the server probes toward the client when? After receiving a probe (no?), or after receiving non-probing packets (yes?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Martin -- I think this needs more text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some text, but it feels duplicative of {{address-spoofing}}. Concrete suggestions of "more text" would be welcome.

@@ -1558,7 +1583,7 @@ failure. Primarily, this happens if a connection migration to a new path is
initiated while a path validation on the old path is in progress.


## Connection Migration {#migration}
## Client Connection Migration {#migration-client}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should make this "client" connection migration. This is the generic description, and the addition you made below is just a special case for the server that can be initiated during the handshake.

@@ -1791,6 +1811,64 @@ number. "packet_number_secret" is derived from the TLS key exchange,
as described in Section 5.6 of {{QUIC-TLS}}.


## Server Connection Migration {#migration-server}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like (and prefer) the suggested name change.

When you say that not changing network attachment is a necessary assumption, why is that important? The new address will be validated and congestion control rebooted in the same way that any other migration would be. So even if this points to a completely different server instance (with keys being shipped across as needed), it still works.

A server initiates connection migration by including the preferred_address
transport parameter in the TLS handshake.

Once the handshake is finished, the client MUST initiate path validation (see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is only effective if it can be relied upon. That said, it's a big deal, so we should talk about it.

@mikkelfj
Copy link
Contributor

mikkelfj commented Mar 28, 2018

This proposal is sort of acting like ones own DNS server. I'm not against doing this, but I think are corner cases to consider.

  • The same server can use multiple load balancers and it is normal (and slow) to have clients fail over when one load balancer goes belly up. A server that wants to move the client may want to list multiple IP addresses on the new destination. It could be resolved by randomly picking one per connection, only the server might not know that one is down.
  • Is it always the case that a server sends on the same address as it receives (via translation in path if necessary)?
  • Certs are tied an IP and changing IP is an attack vector on the servers configuration system - though being able to present a cert on the original address gives some level comfort.

@MikeBishop MikeBishop changed the base branch from master to sr-during-pv April 2, 2018 17:24
@MikeBishop
Copy link
Contributor Author

For simplicity, I'm inclined to say it's the server's job to send a single working IP address to the client. When we have a more complete server migration story, the server can just try multiple times / send multiple addresses.

Yes, just like in path validation, the server responds on the same address to the same address.

I don't see this as an attack, primarily because it's under the server's control. The server could use a dedicated IP per cert to do handshakes, then hand off to a shared IP for existing connections. No certs involved there.

@@ -1864,7 +1945,7 @@ draining. A key update might prevent the endpoint from moving from the closing
state to draining, but it otherwise has no impact.

An endpoint could receive packets from a new source address, indicating a client
connection migration ({{migration}}), while in the closing period. An endpoint
connection migration ({{migration}}}), while in the closing period. An endpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{{{

@MikeBishop MikeBishop changed the base branch from sr-during-pv to master April 17, 2018 23:31
@MikeBishop
Copy link
Contributor Author

Removed the dependency on #1259.

Copy link
Contributor

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Mike, just a few comments, mostly minor.


struct {
enum { IPv4(4), IPv6(6), (15)} ipVersion;
opaque ipAddress<4..2^8-1>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be as long as 255 bytes? Or am I misreading again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The length prefix has to be at least one byte to accommodate any IP addresses; since it's a byte, it can represent a length up to 255 bytes. Obviously, it wouldn't actually be 255 for IPv4 or IPv6.

TRANSPORT_PARAMETER_ERROR.
preferred_address (0x0004):

: The server's Preferred Address is used to effect a server address migration at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/a server address migration/change of a server address/

helps to guard against spurious migration initiated by an attacker.

Once the server has completed its path validation and has received a non-probing
packet on its preferred address which is the largest packet number seen so far,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: s/has received a non-probing packet on its preferred address which is the largest packet number seen so far/has received a non-probing packet with a new largest packet number on its preferred address/

address, the server MUST protect against potential attacks as described in
{{address-spoofing}} and {{on-path-spoofing}}. In addition to intentional
simultaneous migration, this might also occur because the client's access
network used a different NAT binding for the server's preferred address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Martin -- I think this needs more text.

@rpaulo
Copy link
Contributor

rpaulo commented Apr 27, 2018

This looks good to me!

@janaiyengar
Copy link
Contributor

We've given ample time in the wg to discuss this, and all feedback has been positive. Merging.

@@ -1150,6 +1151,14 @@ language from Section 3 of {{!I-D.ietf-tls-tls13}}.
};
TransportParameter parameters<22..2^16-1>;
} TransportParameters;

struct {
enum { IPv4(4), IPv6(6), (15)} ipVersion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unbalanced space within curlies

@janaiyengar janaiyengar merged commit dc1699f into master May 8, 2018
@MikeBishop MikeBishop deleted the preferred_address branch June 14, 2018 23:38
@GuShuhengSimon
Copy link

@MikeBishop, seems this support is removed ? now it's covered by {Connection Migration} part?
Then server's anycast IP => unicast IP migration would still not well supported by QUIC currently? Since the {Connection Migration} requires the "endpoint"(Server in this case) sustain a stable address, and need a full handshake being performed, but in this case, the next CHLO from client during 1-RTT would possibly routed to another "anycast" IP servers which makes anyCast IP => uniCast IP failure.

@DavidSchinazi
Copy link
Contributor

Hi @GuShuhengSimon this feature is supported by QUIC. It uses the preferred_address transport parameter.

See Section 9.6 of draft-19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants