Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the Path ID change if the CID changes or not? #169

Closed
mirjak opened this issue Feb 21, 2023 · 26 comments · Fixed by #185
Closed

Does the Path ID change if the CID changes or not? #169

mirjak opened this issue Feb 21, 2023 · 26 comments · Fixed by #185
Labels

Comments

@mirjak
Copy link
Collaborator

mirjak commented Feb 21, 2023

Currently it seems that the assumption is that the Path ID changes if the CID changes, however, this makes handling of MP frames that contain the path ID more complicated because you have to remember the old path ID to process these frames even when the CID is retired. Is there any good reason why we don't keep the Path ID constant over the life time of a path?

@mirjak
Copy link
Collaborator Author

mirjak commented Feb 21, 2023

Also note if you retire an CID (e.g. because it's requested by the peer in the NEW_CONNECTION_ID Frame Retire Prior To field) and you don't have a new CID available anymore (yes, this should usually not happen), you cannot send a PATH_ABANDON frame anymore because you don't have valid Path ID (and we don't have the type field anymore).

@huitema
Copy link
Contributor

huitema commented Feb 26, 2023

The sequence number is tied to the CID, because of encryption. If we tie it to something else, we are just moving the problem, because then we have to worry about mapping the CID to a path ID.

Yes, this has a couple of constraints. If a CID is actually retired, there is not way to acknowledge the packets that were sent using that CID. Effectively, they are treated as lost. The obvious solution is to write "don't do that" guidelines. Basically, wait a couple RTO before retiring a CID, so the ACK have time to arrive.

The other constraint is that ACK-MP carry the ID used when sending the packets. This is not a huge constraint, but it is an exception to the proposed "same path for ACK" policy. But the main argument for that same path policy is about computing timers. That argument does not really apply when renewing the CID.

Same issue for the path abandon. If packets are sent on the same four tuple with a new CID, just abandon that.

@yfmascgy
Copy link
Contributor

yfmascgy commented Feb 27, 2023

In addition to the encryption consideration as mentioned above. I remember there was another reason why the sequence number of CID was chosen to identify a path. It brings us back to the definition of what is a path and what is a path identifier.

By definition of the draft and RFC9000, a path is defined by a 4-tuple {source IP address, source port number, destination IP address, destination port number}, and a path ID is an identifier used to identify a path. That means we need to construct a mapping between a path ID and the associated 4-tuple.

However, there are couple of issues. First, the client's view and the server's view of the 4-tuple are not the same due to NAT. Second, the 4-tuple can change due to (1) network path migration (think about the case where our phone's wifi is connected to a wifi router that has two outbound networks to the internet, one wireline and one satellite, and the router can switch between the two), and (2) NAT rebinding. If we use a constant path ID, then the path ID does not uniquely identify a 4-tuple, but instead, there will be a one-to-many mappings from a path ID to 4-tuples.

In #179, Quentin proposed to use entry point IDs to identify a path, I think this identification is stable in the case of NAT rebinding, where the 4-tuple changes but the physical path is actually unchanged. However, the problem is that when the underlying path has changed (in the above case, a router may switch packet from the wireline to the satellite to perform failover) while the entry point IDs on the end-hosts remain unchanged, then we end up using the same path identifier to identify different paths.

These problems can be solved when using QUIC's CID sequence number to identify a path. First, it allows the client and server to identify a path separately as they see different 4-tuples. Second, when 4-tuple changes, we eventually use new CIDs to communicate and thus, the mapping between a path ID and the 4-tuple remains as a one-to-one mapping. Third, we are logically correct in the case when the physical path indeed has migrated while the endhost interfaces are not changed.

@huitema
Copy link
Contributor

huitema commented Feb 27, 2023

I don't think that "this makes handling of MP frames that contain the path ID more complicated". You have a code object holding the list of packets to acknowledge (receiver side), or an object containing the list of packets not acknowledged yet (sender side). These objects are tied to the connection ID. Nothing particularly complicated there: the sender knows exactly which connection ID it uses for a given packet; the receiver knows exactly what connection ID was carried in the packets that it received.

The issue that you mention comes from the ambiguity of "Abandon Path", "Retire connection ID", and "retire previous connection ID". We have to be clear about what that means:

  • Abandon Path means "please do not send any more packets on that path". No ambiguity there. It also implies: "you can free the resource of that path at your convenience". We need to specify how and when, there is some leeway.
  • Retire Connection ID" means "I have deleted all resource tied to the connection ID on my side", which implies "I will not use this connection ID to send any new packet" and also "I will not accept any new MP-ACK sent using the sequence number of that CID".
  • "retire previous CID" means "please retire these CID quickly". RFC 9000 says "The endpoint SHOULD continue to accept the previously issued connection IDs until they are retired by the peer", so there is some leeway.

So I guess we have the following actions:

  • A node decides to stop using a CID, i.e., not send any more packet using that CID. The node can do that at any time on its own will, and SHOULD do that immediately after receiving an Abandon_Path frame, or after receiving a "retire before" indication.
  • The node should wait 3*RTO after that decision, so pending acknowledgements have enough time to arrive.
  • After 3*RTO, the node deletes the local resource, including the list of packets not acknowledged yet. Those packets should be considered lost.
  • The node then sends a Retire Connection ID frame.
  • The node receiving the Retire Connection ID frame deletes the resource associated with the CID, i.e., the list of packets to acknowledge. If the peer did wait long enough before sending Retire Connection ID frame, that list should be empty. If it is not, too bad, it can still be deleted, these packets will never be acknowledged, but the sender of the CID does not care.
  • After that, a node can still received an MP-ACK or an Abandon_Path mentioning an old CID, because we do have race conditions. Such frames should be silently ignored.

Does that remove the ambiguities?

@qdeconinck
Copy link
Contributor

I think the current specification (i.e., rely on sequence number of Destination CID to identify a path in PATH_STATUS/PATH_ABANDON) works fine as long as 1) the perceived 4-tuple of a network path remains stable, or 2) the DCID used on a network path remains stable. Also, if the peer has no timing restrictions about when retiring a CID, this is fine as well.

However, when receiving a NEW_CONNECTION_ID frame with a "Retire Prior To" field forcing the receiver to remove DCIDs, RFC 9000 states that "Upon receipt of an increased Retire Prior To field, the peer MUST stop using the corresponding connection IDs and retire them with RETIRE_CONNECTION_ID frames before adding the newly provided connection ID to the set of active connection IDs.". Not sure the receiver of the NEW_CONNECTION_ID may delay the retirement of DCIDs for a long period. Once retired, the host cannot reference the paths identified by the retired CIDs in, i.e., PATH_ABANDON frames, which may be an issue if the peer does not provide enough new CIDs.

But the current path identification is fragile when one host (the client) changes the DCID used over a same perceived 4-tuple, but that the other (the server) perceives a different 4-tuple (typical case of NAT rebinding). The client will keep a single view of that path, but the server will have two different views having different 4-tuples and Path IDs for an actual same path, hence bringing path identification ambiguity here.

@yfmascgy As long as the 4-tuple does not change, there is no "path change" from the QUIC viewpoint. Having "backbone router path migration" will be transparent to the endmost (although the path characteristic conditions such as the latency and bandwidth may change). The EntryPoint ID proposal is mainly to keep stable path identification in PATH_ABANDON/PATH_STATUS frames when 1) CIDs change and the sender's perceived 4-tuple is stable but the receiver's perceived 4-tuple is not, and 2) when there is no more usable CID over the (network) path to reference.

@huitema
Copy link
Contributor

huitema commented Feb 27, 2023

So the scenario that you worry about is "what if the client sleeps for a while, then decides to switch the connection ID before sending new packets". This is actually a recommended behavior, for privacy reasons, so we have better get it right.

I think that the current proposal mostly works. After the migration, the new packets sent by the client will be tied to the new CID on both sides -- see previous points about tying number space to connection IDs. There is no ambiguity about packet number space, packets will be correctly acknowledged, etc. The Abandon_Path or MP_ACK frames will not be ambiguous: they refer to the number space, not to the abstract concept of Path.

The main problem is that the server will probably NOT treat the incoming packet with a new CID as a "nat rebinding". The algorithm pretty much defines NAT rebinding as "same CID, different addresses or ports". The conforming server will treat that as a new path, perform address validation, etc. If the server tries to use the old "path", the packets will be dropped by the NAT. The connection will recover eventually, but after some packet losses, and yes, that's not desirable.

I think the simplest solution is for the client to somehow tell the server what it is doing. "These packets carry CID sequence number 7. For me, this is the same path as when previously using CID sequence number 4." My gut feeling is that this can be achieved by sending an "Abandon_Path(id=4, errorCode=CidRenewal)" when starting sending with CID 7. Specify something like that in the spec, document the error code, etc.

@mirjak
Copy link
Collaborator Author

mirjak commented Feb 27, 2023

Please note that PR #172 is related here. In this PR I propose that a new 4-tuple is always treated as a new path and therefore always triggers path validation which then ensures to create a new path on both sides. I thought a while about this scenario when working on this PR and believe that's the easiest, non ambiguous solution which I think is inline with the multipath idea.

Further, I think that en-/decryption and the path ID do not need to be entangled. Of course we need to use the CID to decrypt the packet, however, I don't think that changing the CID (without changes of the 4-tuple) creates a new packet number space. This is not the case in RFC9000 and should not be the case here.

@yfmascgy
Copy link
Contributor

Please note that PR #172 is related here. In this PR I propose that a new 4-tuple is always treated as a new path and therefore always triggers path validation which then ensures to create a new path on both sides. I thought a while about this scenario when working on this PR and believe that's the easiest, non ambiguous solution which I think is inline with the multipath idea.

I actually agree with this idea. It is simple and non-ambiguous. Also note that NAT rebinding is a low possibility event. I think the cost for treating a new 4-tuple as a new path is negligible.

@yfmascgy
Copy link
Contributor

@yfmascgy As long as the 4-tuple does not change, there is no "path change" from the QUIC viewpoint. Having "backbone router path migration" will be transparent to the endmost (although the path characteristic conditions such as the latency and bandwidth may change). The EntryPoint ID proposal is mainly to keep stable path identification in PATH_ABANDON/PATH_STATUS frames when 1) CIDs change and the sender's perceived 4-tuple is stable but the receiver's perceived 4-tuple is not, and 2) when there is no more usable CID over the (network) path to reference.

It is not necessarily true that the router path migration is always transparent. In the case when a wifi router switches between a terrestrial ISP and a satellite backbone, the outbound packet may have different source IP addresses since the satellite network and the terrestrial network are operated by different ISPs. In this case, the path switching will cause 4-tuple to change.

@huitema
Copy link
Contributor

huitema commented Feb 28, 2023

The difference between "new path" and "rebinding" is probably less than it sounds. In the behavior suggested by RFC 9000, arrival of packets with the same CID and a new 4-tuple triggers both validation of the new path using Path Challenge, and verification that the old path is actually gone by a parallel Path Challenge on the old path. The verification is there to deal with a potential attack in which an on path attacker (e.g., somebody on the same Wi-Fi network) can capture a 1RTT packet sent by the client, and then resent it from a spoofed IP address.

I am a bit concerned that "treating NAT rebinding as a new path" will keep the Path Challenge on the new path, but not the verification attempt of the old path, and thus expose us to the attack that the verification is meant to mitigate.

@huitema
Copy link
Contributor

huitema commented Feb 28, 2023

@yfmascgy when you say "a wifi router switches between a terrestrial ISP and a satellite backbone" is that you want the concept of path in QUIC to follow the topology changes in the network. But we should recognize that this will not always be possible. For example, routing changes can always cause a single 4-tuple to be suddenly routed in a very different way.

@yfmascgy
Copy link
Contributor

@yfmascgy when you say "a wifi router switches between a terrestrial ISP and a satellite backbone" is that you want the concept of path in QUIC to follow the topology changes in the network. But we should recognize that this will not always be possible. For example, routing changes can always cause a single 4-tuple to be suddenly routed in a very different way.

I think if the 4-tuple does not change, we do not pursue the goal of following the topology change as it is basically not possible. However, my point is when there is a topology change and a 4-tuple change (now we have a detectable signal), we probably do not want to treat it as a NAT rebinding. As we are actually on a different path, retaining the congestion control state and round-trip estimate does not seem to be right. A more reasonable action is to treat it as a new path and reset the congestion controller, right?

@qdeconinck
Copy link
Contributor

So the scenario that you worry about is "what if the client sleeps for a while, then decides to switch the connection ID before sending new packets". This is actually a recommended behavior, for privacy reasons, so we have better get it right.

Exactly.

I think the simplest solution is for the client to somehow tell the server what it is doing. "These packets carry CID sequence number 7. For me, this is the same path as when previously using CID sequence number 4." My gut feeling is that this can be achieved by sending an "Abandon_Path(id=4, errorCode=CidRenewal)" when starting sending with CID 7. Specify something like that in the spec, document the error code, etc.

I think the proposed approach here could keep changes minimal, although this is not yet perfect. If the client sends two packets with the new CID, but the first one containing such information is lost (but not the second one, which experiences NAT rebinding), then we still face the issue. Of course, we will eventually recover the situation, but there will be a transient state here.

Also note that NAT rebinding is a low possibility event.

Maybe, but I'm not convinced we should take this assumption as granted.

More generally, I start thinking about possible security issues we may encounter with this, where the client may frequently rotate the DCID it uses over a path for legitimate purposes, but an on-path attacker may tweak the 4-tuple, hoping it will make it create a lot of (invalid) paths' states at server side. Not sure this will be a strong issue, but we will need to document that at some point.

I think if the 4-tuple does not change, we do not pursue the goal of following the topology change as it is basically not possible. However, my point is when there is a topology change and a 4-tuple change (now we have a detectable signal), we probably do not want to treat it as a NAT rebinding. As we are actually on a different path, retaining the congestion control state and round-trip estimate does not seem to be right. A more reasonable action is to treat it as a new path and reset the congestion controller, right?

I think in case we notice a path "migrated" (same CID, different 4-tuple), we can apply the "RFC9000 connection migration" feature on a per-path level. In such case, we can follow Section 9.4 of RFC9000, stating that congestion control/RTT must be reset, unless it experiences a port-only change (reset is not mandatory in such cases).

@yfmascgy
Copy link
Contributor

yfmascgy commented Feb 28, 2023

Let's take a step back. Using the current path identifier mechanism, we are fine if (1) CID changes but tuple does not change and (2) tuple changes but CID does not change. The only difficult case is when the client uses a new CID and the tuple changes (NAT rebinding) at the same time. Note that a NAT rebinding is improbable if packets were recently received on the old path (also see section 9.3.3 in RFC9000). A NAT rebinding mostly happens if a path is idle for some time, but that is already managed by the 4.3.4 in the current draft:

"Hosts SHOULD stop sending traffic on a path if for at least the period of the idle timeout as specified in Section 10.1. of [QUIC-TRANSPORT] (a) no non-probing packet was received or (b) no non-probing packet sent over this path was acknowledged, but MAY ignore that rule if it would disqualify all available paths."

In other word, if a path is idle long enough such that the 4-tuple is changed by NAT, it should have already been closed except there is only one path left. Therefore, the chance that the server sees (1) >=2 active paths and (2) a packet that has a new CID and a new 4-tuple at the same time is low.

When the server sees (1) >=2 active paths and (2) a packet that has a new CID and a new 4-tuple, the issue is that we don't know which path the packet belongs to. Hence, we just treat it as an attempt for a new path as @mirjak suggested. However, in the current draft, when the client initiates a new path, the packet should contain a PATH_CHALLENGE frame (see figure 2). If the packet does not contain PATH_CHALLENGE, we should just ignore the packet, which eventually leads to path closure as specified by the current draft (see figure 1). Note that in doing so, we are also resilient to the attacks mentioned by @qdeconinck that "the client may frequently rotate the DCID it uses over a path for legitimate purposes, but an on-path attacker may tweak the 4-tuple, hoping it will make it create a lot of (invalid) paths' states at server side".

When the server sees (1) only 1 active path and (2) a packet of a new CID and a new 4-tuple but that packet does not have PATH_CHALLENGE, it knows which path the packet belongs to, and we just follow what single path QUIC does and the behavior of multipath quic and single path quic converge in this case.

Therefore, I think the current draft can already address the issue, and we probably just want to stick to what we have with some minor modifications.

@mirjak
Copy link
Collaborator Author

mirjak commented Feb 28, 2023

If the server sees a packet with a new CID and new 4-tuple but without path challenges it should ignore it as it can't associate it with an existing path. If the client switches to a new path (new 4-tuple) it has to use a new CID and send a path challenges frame.

The problem I'm trying to address are cases where the CID (and path ID) changes for whatever reason and we received some valid frames with the old path ID that we can't associate to anything anymore and therefore (unnecessarily) need to drop. Yes, this should not happened that often, so dropping is not the worst option, however, I think having a stable path ID would also make the whole approach logically easier, but maybe that's something we need to discuss and agree on.

Note that if we change the path ID, that automatically means that we have to silently ignore any unknown path ID as we never know for sure if it was an old path ID or really a completely invalid one. Otherwise we could create an error if we receive an invalid path ID. This could help to avoid that the end get out of sync. However, not sure if that is actual a problem that needs solving.

In both cases I think we definitely need to clarify some things in the draft and make sure the taken approach is crystal clear to the reader.

@huitema
Copy link
Contributor

huitema commented Feb 28, 2023

When to reset the congestion controller? That's a very generic problem, and the IETF is creating a WG to study this kind of issues. I think we should not try to invent a multipath specific solution, and just stick with whatever RFC 9000 says now, or what new RFCs will say later.

@huitema
Copy link
Contributor

huitema commented Feb 28, 2023

The more I read this discussion, the more I think we should stick to the path handling specification in RFC 9000. Do a search for NAT in the archive of QUIC GitHub issues, and you find a big list of messages. Lots of history there, with discussion of congestion control, security issues, DOS attacks, etc. Of course, we could have these discussions again in the multipath context, but we would converge much faster if we did not!

@mirjak
Copy link
Collaborator Author

mirjak commented Feb 28, 2023

We should not change the guidance on congestion control in RFC9000. However, otherwise I don't really understand what you mean by "stick to path handling in RFC9000". RFC9000 only has one path at a time, however, this is exactly the part that this extension is changing, so I don't see it as a deviation from the migration handling in RFC9000. Again we have to make a decision and define it clear but wouldn't one or the other solution is more inline with RFC9000 or not.

@huitema
Copy link
Contributor

huitema commented Feb 28, 2023

Looking back at Mirja's original question, "Currently it seems that the assumption is that the Path ID changes if the CID changes, however, this makes handling of MP frames that contain the path ID more complicated because you have to remember the old path ID to process these frames even when the CID is retired. Is there any good reason why we don't keep the Path ID constant over the life time of a path?"

I think we should just bite the bullet, and acknowledge that retiring a CID makes it unusable in the future -- MP frames mentioning the corresponding ID will just be silently ignored. Yes, this can cause for example some spurious packet loss detection, but that can be minimized with strict rules about when to retire a CID -- the various RTO guidelines that we discussed in other threads. Better expose the consequence clearly, so nodes that think of retiring a CID think twice.

@huitema
Copy link
Contributor

huitema commented Feb 28, 2023

As for "same as RFC 9000", yes we will of course have one difference, since we handle multiple parallel paths. I am really looking at the reactions to NAT rebinding, how we define what is a NAT rebinding and what is not, etc.

@yfmascgy
Copy link
Contributor

yfmascgy commented Mar 1, 2023

I think we can use path ID mechanism of the current draft with the addition of the following rules to address the issues discussed in this thread:

(1) Addressing the corner case when new CID and NAT rebinding happen at the same time. When an endpoint receives a packet that has (new CID, new tuple), check if the packet is trying to initialize a new path (i.e., whether it contains a PATH_CHALLENGE). If it has PATH_CHALLENGE, try creating a new path. If it has no PATH_CHALLENGE and we don't know which path the packet is associated with (num. of paths>=2), discard the packet. If there is only one path, we perform path validation as single path QUIC does.

(2) Addressing the issue that you cannot send PATH_ABANDON when you don't have CIDs available to use on a path.

Also note if you retire an CID (e.g. because it's requested by the peer in the NEW_CONNECTION_ID Frame Retire Prior To field) and you don't have a new CID available anymore (yes, this should usually not happen), you cannot send a PATH_ABANDON frame anymore because you don't have valid Path ID (and we don't have the type field anymore)."

When an endpoint receives a NEW_CONNECTION_ID frame that has retire_prior_field set and it finds that the retire_prior_to field will cause it to have no more CIDs to send packets on a particular path, before sending RETIRE_CONNECTION_ID frame, it first sends PATH_ABANDON frame to signal path closure to the peer.

@qdeconinck
Copy link
Contributor

@yfmascgy I think I can live with your proposed rules. Adding some text stating that a client SHOULD include a PATH_CHALLENGE frame when it changes the DCID it uses (even if it does not change the 4-tuple used) is reasonable to address the aforementioned corner cases.

I wonder though if it would make sense to define a small informative frame to indicate to the client that the new DCID seen has been rejected as it cannot map to an existing path, to suggest the client it should retire the CID and retry with a new one while bundling PATH_CHALLENGE, but not a big concern though.

@mirjak
Copy link
Collaborator Author

mirjak commented Mar 1, 2023

So, yes we can make it work with changing the path ID. But I wonder if things would be simpler if we keep it stable. So asking my initial question again: what are the drawbacks when we try to keep it stable?

@qdeconinck
Copy link
Contributor

I think it would be nice to have stable Path IDs. However, all the challenge resides in how we could define such stable "path identifiers", as the sending/receiving paths may use different CIDs over their lifetime and may perceive different 4-tuples.

@yfmascgy
Copy link
Contributor

yfmascgy commented Mar 3, 2023

I think it would be nice to have stable Path IDs. However, all the challenge resides in how we could define such stable "path identifiers", as the sending/receiving paths may use different CIDs over their lifetime and may perceive different 4-tuples.

That is a good point. The fundamental issue is that in reality a path is only precisely defined by the sequence of every possible hops between two endpoints, but that can not be observed by the endpoint, so we have to use the 4-tuple definition of a path as a simplified and compromised notation that unfortunately does not capture the full picture. QUIC goes around this issue with the use of connection ID, so that we are not bothered by unstable path identifiers and can enable path migration. Therefore, I feel in multipath QUIC, we may want to inherit what QUICv1 does and reuse the CID mechanism to identify paths.

qdeconinck added a commit that referenced this issue Mar 3, 2023
In lot of places in the draft, we mention the "Path Identifier",
which we define as the Destination Connection ID sequence number
used over that path. It seems that maintaining the notion of
"Path Identifier" causes a lot of confusion, so better keep things
simple and mention CID sequence numbers directly.

Note that this implies that the related fields in
multipath-specific have been renamed to their "explicit" form. Some
of these names are maybe a bit long, so we may shorten them if this
does not introduce ambiguity.

Fix #169. Rewriting related section should also fix #181.
@qdeconinck
Copy link
Contributor

Created #188 to address the new CID and NAT rebinding case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants