Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for off-path migration attack #2033

Merged
merged 5 commits into from
Dec 4, 2018
Merged

Fix for off-path migration attack #2033

merged 5 commits into from
Dec 4, 2018

Conversation

martinthomson
Copy link
Member

This is not an easy attack to defend against, except for
probabilistically. So what we do is recommend more probing on old paths
to give the endpoint that is apparently migrating more opportunities to
cause the connection to migrate away from the path chosen by an
attacker.

I've tweaked surrounding text a little. The most interesting being the
3RTO timer on path validation. It's not the right number, but I don't
think that the right number is attainable, and this is close enough.

This text isn't final. I'd like it to be more accurate AND shorter, but
lack the skills and perspective.

Closes #1278, #1749.

This is not an easy attack to defend against, except for
probabilistically.  So what we do is recommend more probing on old paths
to give the endpoint that is apparently migrating more opportunities to
cause the connection to migrate away from the path chosen by an
attacker.

I've tweaked surrounding text a little.  The most interesting being the
3RTO timer on path validation.  It's not the right number, but I don't
think that the right number is attainable, and this is close enough.

This text isn't final. I'd like it to be more accurate AND shorter, but
lack the skills and perspective.

Closes #1278, #1749.
@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Nov 21, 2018
@kazuho
Copy link
Member

kazuho commented Nov 21, 2018

Thank you for writing down the PR.

I know that this question has been asked before, but do we need PATH_CHALLENGE and PATH_RESPONSE to verify NAT rebinding?

IIUC a client has only one path that can be used for send non-probing packets. Therefore, a server receiving a non-probing packet (with a PN greater than what it has seen before) can be used as a way to determine the current active path.

So to me, it seems that just using PING will be sufficient for the server to verify what the client thinks as the current path. In other words, there is no reason for a server to send PATH_CHALLENGE (and for a client to send PATH_RESPONSE).

The reason I raise the question, even though I understand that the PATH_CHALLENGE / PATH_RESPONSE is nevertheless required for the client to "probe" for a new path, is because removing the requirement for the server to send a challenge / wait for response seems like a simplification (at the cost of increasing asymmetry).

EDIT (Nov 22 0339UTC): Retracting the comment. I missed the fact that the entropy provided by the data field of PATH_CHALLENGE / PATH_RESPONSE is necessary to prevent a legitimate but malicious client from redirecting the server's packets to the victim's address.

@@ -1766,7 +1768,8 @@ abandons its attempt to validate the path.

Endpoints SHOULD abandon path validation based on a timer. When setting this
timer, implementations are cautioned that the new path could have a longer
round-trip time than the original.
round-trip time than the original. A value of three times the current
Retransmittion Timeout (RTO) as defined in {{QUIC-RECOVERY}} is RECOMMENDED.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won’t work out well when switching from a 10ms WiFi.
I don’t think any value derived from the current RTT helps here. We don’t have any information about the new path, so I suggest we define a constant duration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought, or a minimum value of sorts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but we still have a minimum on the RTO, which should cover this case well enough.


An off-path attacker that can observe packets might forward copies of genuine
packets to endpoints. If the copied packet arrives before the genuine packet,
this will appear as a NAT rebinding. Any genuine packet will be discarded as a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be more precise in describing the attack:

  • when racing packets to the server, the attacker uses his own sender address on the UDP packet
  • when forwarding packets to the client, the attacker doesn’t spoofs the server’s address

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we need a precise description of the attack. This is already far too many words.

Copy link
Contributor

@MikeBishop MikeBishop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it feels a bit verbose, but I think all the right information is there.

will time out and fail; if the path is viable, but no longer desired, the
validation will succeed, but only result in a probing packet being sent on the
path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is all heuristics, we may want to be a bit more explicit. If the destination CID in the packet is new, then the migration is more likely to be voluntary. If the address family is IPv6, NAT rebinding is not expected quite as much, even with NAT66, since NAT66 is very unlikely to run out of address space. If the last rebinding happened fewer than X seconds ago, this is probably very suspicious. Etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is that this isn't heuristics-based decision-making. It's currently arranged to be mechanical. The only discretionary part is the rate at which probes are generated. I think that the right thing to do is acknowledge that heuristics might be used to make this defense more robust and to mention some of the things you point to.

path via the attack is reliably faster than the original path despite multiple
attempts to use that original path, it is not possible to distinguish between
attack and an improvement in routing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yeah, except if the improvement in routing only last for a short while, and after that the connection is dropped.

Copy link
Member

@kazuho kazuho Nov 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the connection simply move back to the original path (with a higher RTT) once the attacker stops racing the packets? So I think that calling the attacker an "improvement in routing" is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, since there will still be duplicate packets from the client and you can't make the client actually change the remote address that it sends to. So even if the attack succeeds, if the attacker stops delivering packets (or even starts delivering them more slowly), I think you would automatically just go back to the original path (as long as the client is still sending).

My understanding here is that this overall change is mostly necessary to prompt traffic from a client that might otherwise have quiesced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk is that the attacker can convince a peer that the connection is dead, meaning that no more packets are sent. The interaction between validation timers and dead path detection probably need some work.

Copy link
Contributor

@erickinnear erickinnear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fair bit of text here, but overall seems like a good balance of addressing the concern without adding significant complexity. Thanks for putting this together and writing it up!

even if the data matches that sent in the PATH_CHALLENGE. This doesn't result
in path validation failure, as it might be a result of a forwarded packet (see
{{off-path-forward}}) or misrouting. Thus, the endpoint considers the path to
be valid when a PATH_RESPONSE frame is received on the same path with the same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have three options for path validation: success, failure, and neither. Are there any other places that reference this where we need to update the wording around expected behaviors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always had that (this is just changing this text to be consistent with the text in other places).

process.

Abandoning this validation attempt before it either succeeds or times out
increases exposure to the packet copying attack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a necessary (independent) paragraph?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this statement should be made more explicit. That is, the server should not abandon or even postpone a pending path validation when a new apparent migration is seen. Consider the case where the copying attacker's path to the server has about the same latency as the client's path. The server will see a random mix of packets from both paths, which results in a rapid sequence of apparent migrations. If each apparent migration causes pending validation attempts to be abandoned, or just causes the timer to be reset, then no validation will succeed and the server will never send another non-probing packet. This is what I observed in picoquic -- successive timer resets caused the server to become completely silent.

restart the timer for a longer period of time.
{{QUIC-RECOVERY}}) may be adequate. For instance, an endpoint might delay
switching to a new congestion control context until it is confirmed that an old
path is no longer needed (for the case in {{off-path-forward}}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: wording

Suggested change
path is no longer needed (for the case in {{off-path-forward}}).
path is no longer needed (such as the case in {{off-path-forward}}).

(or something of that sort)

path via the attack is reliably faster than the original path despite multiple
attempts to use that original path, it is not possible to distinguish between
attack and an improvement in routing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, since there will still be duplicate packets from the client and you can't make the client actually change the remote address that it sends to. So even if the attack succeeds, if the attacker stops delivering packets (or even starts delivering them more slowly), I think you would automatically just go back to the original path (as long as the client is still sending).

My understanding here is that this overall change is mostly necessary to prompt traffic from a client that might otherwise have quiesced.

sending rate. An endpoint might set a separate timer when a PATH_CHALLENGE is
sent, which is cancelled when the corresponding PATH_RESPONSE is received. If
the timer fires before the PATH_RESPONSE is received, the endpoint might send a
new PATH_CHALLENGE, and restart the timer for a longer period of time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we'd already addressed this elsewhere, but I can't find it anymore, so this looks good.
My memory was that we said to retransmit PATH_CHALLENGE based on a timer until you got a response or decide that validation failed, although now we have three states for validation result. If we have that elsewhere (and I just missed it) then we should reconcile with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the old text, with the paragraph split to allow for the new text. If you have a suggestion here, that would be great, but I can't figure out how to add that without doing a lot of damage to the existing text.


A sender can make exceptions for probe packets so that their loss detection is
independent and does not unduly cause the congestion controller to reduce its
sending rate. An endpoint might set a separate timer when a PATH_CHALLENGE is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...which makes sense, given that you don't retransmit PATH_CHALLENGE or PATH_RESPONSE, maybe we should note that here or nearby as well?

@janaiyengar
Copy link
Contributor

I think this is good. I would have preferred lesser text, but let's get this in and see about that later.

Copy link
Contributor

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits

even if the data matches that sent in the PATH_CHALLENGE. This doesn't result
in path validation failure, as it might be a result of a forwarded packet (see
{{off-path-forward}}) or misrouting. Thus, the endpoint considers the path to
be valid when a PATH_RESPONSE frame is received on the same path with the same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might restructure a bit and split this up into two paras. Move the last sentence ("Thus, the endpoint ... the PATH_CHALLENGE frame.") to before the discussion of non-success (before the sentence "If a PATH_RESPONSE frame ..."). Break para there.

active path using a PATH_CHALLENGE frame. This induces the sending of new
packets on that path. If the path is no longer viable, the validation attempt
will time out and fail; if the path is viable, but no longer desired, the
validation will succeed, but only results in a probing packet being sent on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
validation will succeed, but only results in a probing packet being sent on the
validation will succeed, but only results in probing packets being sent on the

An endpoint that receives a PATH_CHALLENGE on an active path SHOULD send a
non-probing packet in response. If the non-probing packet arrives before any
copy made by an attacker, this results in the connection being migrated back to
the original path. Any subsequent migration to another path resets this entire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the original path. Any subsequent migration to another path resets this entire
the original path. Any subsequent migration to another path restarts this entire

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection migration failure mode
9 participants