Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Description of the use of Preferred Address is unclear #3353

Closed
kazuho opened this issue Jan 16, 2020 · 18 comments · Fixed by #3589
Closed

Description of the use of Preferred Address is unclear #3353

kazuho opened this issue Jan 16, 2020 · 18 comments · Fixed by #3589
Assignees
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@kazuho
Copy link
Member

kazuho commented Jan 16, 2020

At the moment, section 9.6.1 states:

Once the handshake is finished, the client SHOULD select one of the two server's preferred addresses and initiate path validation (see Section 8.2) of that address using the connection ID provided in the preferred_address transport parameter.

If path validation succeeds, the client SHOULD immediately begin sending all future packets to the new server address using the new connection ID and discontinue use of the old server address. If path validation fails, the client MUST continue sending all future packets to the server's original IP address.

I think these paragraphs have two issues:

  • It begins with "once handshake is finished," but I am not sure if we define when a handshake finishes. I think we should change this to "once handshake is confirmed," as that is the point where we allow connection migration.
  • It states that the connection ID to be used on the new path would be the one using TP.preferred_address. I think that is incorrect, as the server might have requested retirement of the CID provided by that transport parameter by the time the client initiates (or finishes) path validation. It should either be an address provided by that transport parameter, or an unused CID provided by the server (i.e. through NCID frames). In fact, when the server sends an NCID frame with Retire Prior To set to a value greater than 1, the client should not be using the CID provided by the transport parameter. Note that this problem is orthogonal to Which DCID do Handshake retransmissions use? #3348, because these actions in relation to preferred address happens at some point after the handshake is confirmed.
@ianswett
Copy link
Contributor

ianswett commented Jan 16, 2020

This text precedes a clear definition of handshake confirmed, so I strongly agree with your first suggested clarification.

I can imagine cases when the CID should be based on the path, so I'm less clear on the correct behavior for the second point. By definition, if the CID provided in the transport param has been retired it should not be used to initiate path validation. But I'm unclear if the preferred behavior is don't migrate if you haven't already initiated migration or to use one of the newer CIDs.

@kazuho
Copy link
Member Author

kazuho commented Jan 16, 2020

@ianswett Thank you for your comments.

I can imagine cases when the CID should be based on the path, so I'm less clear on the correct behavior for the second point. By definition, if the CID provided in the transport param has been retired it should not be used to initiate path validation. But I'm unclear if the preferred behavior is don't migrate if you haven't already initiated migration or to use one of the newer CIDs.

First of all, let me state that we do not need to forbid such a design.

However, in a design that expects CIDs to be specific to the server address being used, a server cannot issue a new CID until the migration to the preferred address completes. This is because if a server sends a NCID frame from the original server address before the client completes migration to the preferred address, the server cannot tell if the client would use that issued CID on the original path (this happen when the client fails to migrate to the preferred address), or if it uses that CID on the migrated path.

Therefore, this issue does not have any effect on such a design.

The question at stake is the client behavior we want to recommend, when the server sends a new CID (or retires CIDs) before migration to the preferred address completes.

My view is that TP.preferred_address is a mechanism of specifying an alternative IP address, that "happens" to also carry a new CID, so that the client can always have an unused CID in hand when it initiates migration to the preferred address.

It is actually simple to implement in such way. What you would do is this:

  • When receiving TP.preferred_address, store preferred_address.CID exactly the same way as you would do when receiving an NCID frame, and separately store preferred_address.IP_address.
  • When handshake is confirmed, if preferred_address.IP_address is stored, initiate a migration to that address, using a CID from a bucket that holds the unused CIDs.

As stated above, such a design would work perfectly fine with servers issuing CIDs specific to server addresses, because a client would have only one unused CID to pick from when talking with such a server.

The design is also simpler than having special case code that associates preferred_address.IP_address and preferred_address.CID, and handles retirement cleanly.

@MikeBishop
Copy link
Contributor

It's a little trickier for servers. Let's say, hypothetically, that the server knows it's using a different CID pool on the alternative address; it picks one for the public address in the handshake, then gives one from the preferred address's CID pool in the TP. It doesn't send NCID frames right away. That's fine.

But let's say the client doesn't migrate successfully. At some point, the server wants to start issuing the client new CIDs for one or the other of the paths, but QUICv1 doesn't have a way to identify path-bound CIDs. So the server has to give up on the preferred address if it can't handle CIDs coming to either interface. There isn't a defined cut-off for when the server should give up expecting the connection attempt, so the best way to do this is to put RPT=2 in the NCID frames.

It seems cleaner initially to special-case the CID from the Preferred Address, and say that you have to use that specific CID to do the migration. But even with that code, the server has to do the same thing: wait to see whether the client migrates or not before issuing new CIDs.

I think the piece we're missing is that when a client declares a migration unsuccessful, it MUST/SHOULD retire the CID it used to attempt the migration. That gives the server a clear signal, succeed or fail; it can then proceed to issue CIDs for the surviving server address.

@ianswett
Copy link
Contributor

Good suggestion on retiring the CID immediately if the migration was unsuccessful. That does provide a clearer signal than the ambiguity we have today.

@kazuho
Copy link
Member Author

kazuho commented Jan 18, 2020

@MikeBishop

But even with that code, the server has to do the same thing: wait to see whether the client migrates or not before issuing new CIDs.

I think the piece we're missing is that when a client declares a migration unsuccessful, it MUST/SHOULD retire the CID it used to attempt the migration. That gives the server a clear signal, succeed or fail; it can then proceed to issue CIDs for the surviving server address.

That's a keen observation. I personally favor the idea of using the RETIRE_CONNECTION_ID frame to indicate if the client has finished migration to the preferred address or if it would continue using the original address.

However, such change would require every client to recognize TP.preferred_address, in sense that even a client lacking support intentional migration would be required to signal the retirement of the CID associated to TP.preferred_address. I think we have considered until now that a client that does not implement migration (or migration to preferred address) to simply ignore this transport parameter.

We need to be clear about that.

@erickinnear
Copy link
Contributor

I think the piece we're missing is that when a client declares a migration unsuccessful, it MUST/SHOULD retire the CID it used to attempt the migration. That gives the server a clear signal, succeed or fail; it can then proceed to issue CIDs for the surviving server address.

I think this is only particularly relevant to SPA, since that's the only time that a server is expecting a migration.

such change would require every client to recognize TP.preferred_address

Definitely worth being clear about what to do if you also disable migration, but implementations that disable migration still need 99% of the rest of the machinery for other purposes, requiring them to immediately send retire for the TP.preferred_address seems okay, if slightly less convoluted than having some "ignore this if that" statement around the whole thing.

@martinthomson
Copy link
Member

As requested, I'm forwarding this comment regarding:

Retirement of either of these connection IDs notifies the server of the address the client has chosen.

This implies a semantic to the retirement of connection IDs that is not already defined. It says that in addition to releasing the resource, the server can say definitively that those other network paths won't be used. But this is misleading because retiring CID 1 does not prevent CID 4 from being used on that path. Nothing says that the connection IDs sent in NEW_CONNECTION_ID have to be used on one or other path. Better to keep the requirement where it is: don't migrate back if you use a preferred address.

Yes, that means that servers can't be sure of behaviour of clients here, but they can use the destination address to confirm acceptance of the preferred address or not. That should suffice.

@kazuho
Copy link
Member Author

kazuho commented Jan 20, 2020

@martinthomson

Retirement of either of these connection IDs notifies the server of the address the client has chosen.

This implies a semantic to the retirement of connection IDs that is not already defined. It says that in addition to releasing the resource, the server can say definitively that those other network paths won't be used. But this is misleading because retiring CID 1 does not prevent CID 4 from being used on that path.

I think that the point being missed here is that under a given condition the server can determine the path being used for CID4.

If the CID4 is issued prior to the client choosing the path, it is true the server would not be able to determine the path on which CID4 will be used.

But if the server withholds NCID frames until the client chooses the path and notifies its choice by retiring one of the first two CIDs, then the server can tell for sure. Because either of the first or the second CID would be retired first, and that tells the server the path the client has chosen. As we agree, we already state that a client cannot come back to the original address, once it migrates to the preferred address.

The benefit of requiring such retirement is that the server can issue CIDs specific to the server address, as pointed out by #3353 (comment). Assuming that such server deployment would be within our design scope, I think requiring such retirement is not a bad idea.

@MikeBishop
Copy link
Contributor

If the migration is unsuccessful, the client must never try the preferred address again (MUST continue sending all future packets....). So as @kazuho says, the server can issue address specific (not path-specific) CIDs once it knows whether the client's migration succeeded or failed.

But @martinthomson is correct that this is implicitly sending a signal about failed migration. If the server sees a CID retired, apparently never used, it might be able to infer that the CID was used for a failed migration, but this case is unique in that the server needs to take some action for a failed migration and uses this signal to do it.

@mnot mnot added this to Triage in Late Stage Processing Jan 21, 2020
@erickinnear
Copy link
Contributor

So far, we haven't tied the server's knowledge of the client's address to anything about a CID, and a lot of the attacks against migration involve an actor on the network changing the path & address from which the server thinks the packets are coming, even if the client does not actually change anything. I haven't thought this all the way through yet, but it seems as though trying to have the server issue "address-specific" CIDs leads us into some potentially tricky territory.

@martinthomson
Copy link
Member

@erickinnear made me realize the nature of the hazard here.

It seems fairly natural for an server to bind connection IDs to the current socket address. That is part of why we have new connection IDs attached to the preferred address and the forced retirement. After all, if the server uses a different target address, the hash at a load balancer could change and cause packets to be badly routed. Not including the target address in the hash would work, but it would force connection IDs to be bigger.

But this all assumes that you never expect the server to migrate. If the server ever wanted to migrate, then the client is stuck with a bunch of unusable connection IDs for its own migrations.

We decided that only clients can initiate migration in this version, which might mean that we don't need to worry about this case. We decided not to allow server migration primarily to avoid having to deal with the complex problems that ICE handles. But it was only the lack of mechanisms, not a structural constraint. Now it seems like we have a structural constraint that would prevent server migration.

At a minimum, it seems like we should probably acknowledge this constraint somehow.

@larseggert larseggert added the design An issue that affects the design of the protocol; resolution requires consensus. label Feb 4, 2020
@project-bot project-bot bot moved this from Triage to Design Issues in Late Stage Processing Feb 4, 2020
@MikeBishop
Copy link
Contributor

The summary of our discussion over dinner last night is that there are various mechanisms a server could employ, but they all depend on one key point: Both endpoints are under joint control, because they're cooperating to handle the connection. Therefore, it's possible to generate a CID that each endpoint will be able to work with. (There are various approaches to generating such a CID, which are implementation-specific; one or more approaches might be described in the QUIC-LB draft.)

That means we don't need to separate CIDs by endpoint; if the migration is successful, the server could choose to issue CIDs that aren't valid on the handshake endpoint, which is now out of the picture.

@huitema
Copy link
Contributor

huitema commented Feb 6, 2020

Seems like a great task for V2: define a policy for handling CID. Or we could try do that in V1 and ship in 2021.

@larseggert
Copy link
Member

Discussed in ZRH. Proposed resolution is to close with no action. New Editorial issue to be opened to explain the intended use of preferred_address and the CID it contains.

@martinthomson
Copy link
Member

An important point to capture here is that the server is responsible for ensuring that the connection IDs are not bound to one or other address at this stage (if migration is successful, later ones might be). Thus, the connection ID in the transport parameter is no different to any other connection ID; it only exists here because using the new address is unusable without more connection IDs being available.

@kazuho
Copy link
Member Author

kazuho commented Feb 6, 2020

@larseggert

Proposed resolution is to close with no action. New Editorial issue to be opened to explain the intended use of preferred_address and the CID it contains.

Please do note that this issue points out other editorial concerns (see the original problem statement). I'd prefer seeing them being addressed to (and we have a WIP text in #3354).

@LPardue LPardue moved this from Design Issues to Triage in Late Stage Processing Feb 21, 2020
@LPardue LPardue moved this from Triage to Design Issues in Late Stage Processing Feb 21, 2020
@martinthomson
Copy link
Member

martinthomson commented Apr 15, 2020

My hope was that the changes in #3354 would help with the editorial parts of the resolution we agreed in Zurich. However, that seems to be stalled. I am going to mark this as proposal ready and will open an editorial issue to track the remaining work on clarifying the text.

To provide abundant clarity: the proposed resolution to the design issue is to make no substantive changes to the protocol.

@martinthomson martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Apr 15, 2020
@project-bot project-bot bot moved this from Design Issues to Consensus Emerging in Late Stage Processing Apr 15, 2020
@erickinnear
Copy link
Contributor

erickinnear commented Apr 20, 2020

To echo #3354:

I think #3354 was going to be replaced with something a bit more condensed, which means that the result here will be "no design changes to be made, editorial text clarified via #3497 and #3589".

@LPardue LPardue added call-issued An issue that the Chairs have issued a Consensus call for. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Apr 29, 2020
@project-bot project-bot bot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Apr 29, 2020
@LPardue LPardue added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed call-issued An issue that the Chairs have issued a Consensus call for. labels May 13, 2020
@project-bot project-bot bot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing May 13, 2020
Late Stage Processing automation moved this from Consensus Declared to Issue Handled May 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
Late Stage Processing
  
Issue Handled
8 participants