Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional security section on fragmentation reassembly attacks #444

Merged
merged 9 commits into from Apr 24, 2017
Merged

Additional security section on fragmentation reassembly attacks #444

merged 9 commits into from Apr 24, 2017

Conversation

huitema
Copy link
Contributor

@huitema huitema commented Apr 19, 2017

Describe the equivalent of the Teardrop attack for QUIC, and propose mitigation.

Describe the equivalent of the Teardrop attack for QUIC, and propose mitigation.
I have lots of emotional things to say about such checks...
Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't talk about flow control. It needs to.

Like #443, I think that this is far more detailed than we need. The point of this is to make an implementer aware that a malicious peer might intentionally fragment the data on receive buffers in order to cause disproportionate memory commitment (either disproportionate to the number of bytes that were transmitted, or disproportionate to the flow control offset that was provided, in practice probably both are necessary to make the attack worthwhile). This can be said more concisely, I think.

The most interesting case for this attack is where receivers over-commit memory and advertise flow control offsets in the aggregate that exceed actual available memory. This strategy works in most cases given that most clients are not attempting denial of service. The very tail of a receive window is rarely needed in practice. Over-commitment fails badly when under this kind of attack.

An adversarial client may attempt to
exhaust server memory resource by performing
a stream fragmentation and reassembly attack, similar to the UDP/ICMP
"Teardrop" fragmentation attacks. The adversarial client would open a stream,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

citation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just dropping the name quoting. Could not find a good Teardrop reference.

@@ -2697,6 +2697,43 @@ also be forward-secure encrypted. Since the attacker will not have the forward
secure key, the attacker will not be able to generate forward-secure encrypted
packets with ACK frames.

## Stream fragmentation and reassembly attacks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title Case

@@ -2697,6 +2697,43 @@ also be forward-secure encrypted. Since the attacker will not have the forward
secure key, the attacker will not be able to generate forward-secure encrypted
packets with ACK frames.

## Stream fragmentation and reassembly attacks

An adversarial client may attempt to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is any endpoint, though I agree that it's (usually) not very interesting for a server to mount the attack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Rewriting to"endpoint" instead of client.

This attack can be mitigated by not
committing memory for stream data reassembly,
and simply keeping the STREAM DATA frames until enough fragments have been
received and the data can be delivered to the application in proper sequence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saving STREAM frames only works if the data provided is sufficiently sparse, at some point the overhead of saving the frames exceeds the overheads of assembling the data into a buffer and tracking the holes.

The real mitigation is not to over-commit on flow control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saving frames is the only way the connection-level flow control window makes sense. Otherwise, you'd have to commit (number of streams)*(stream flow control window) memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's true. That suggests a different way to write this: assume that frames are saved (and maybe merged opportunistically). Then the attack is on the overhead associated with saved frames.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what is meant by using up system memory, an attack may focus on locking out other connections new, or existing by forcing low congenstion windows. Avoiding overcommits can make this situation worse if the attacker succesfully increases flow control budgets.

For practical large scale solutions, the implementation needs to overcommit very significantly. There can be 100K connections of which only a few hundred are active. Each of these need write capacity immediately. If not, it both impacts responsiveness, and it ties up resources by having more concurrent active work going on. The same applies by number of streams vs active streams in some use cases while other cases normally expect streams to be active or closed. If each connection classified as active gets a connection level budget, it doesn't really matter what the stream budget is - this is more for the application consumption management. If the connection budget is abused with holes, it just hurts throughput of the sender and could limit the ability to start new streams. The real problem is to decide which connections are active and which are sleeping without preventing fast rampup of new and sleeping connections, and how to throttle back when connections are no longer active. The attack is then to appear active while commiting the least possible sender resources. A heuristic could be the age of holes. If retransmission does not kick in timely, packets could be dropped deliberately on that connection despite having a reasonable connection level budget.

Copy link
Contributor

@mikkelfj mikkelfj Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should perhaps clarify that in the above, a connection level budget is not a linear function of flow control. There is a fixed amount of internal memory, and as that is released, the congestion window is expanded. So storing lots of fragments will use memory faster and release memory slower, and thus reduce the connection level congestion window. And, when the memory fills, packets starts to drop. In this way, the worst case is that the full budget is consumed with holes, whereas a friendly peer would fill the same budget with linear stream data. The adversary can only create so many holes before the cost of whole punching is more expensive than linear data. Of course, there a endless different ways this can be handled, and it depends on the use case, risk, degradation when not under attack, etc., therefore it is hard to provide general advise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say, Mikkel, "it is hard to provide general advice". So maybe that's own we should rewrite the "advice" part of this PR. Something like "It is hard to provide general advice. QUIC deployments SHOULD provide mitigations against the stream fragmentation attack, which MAY be avoiding over-committing memory, delaying reassembly of STREAM DATA frames, or implementing heuristics based the age and duration of reassembly holes."

@huitema
Copy link
Contributor Author

huitema commented Apr 20, 2017

Shortened the text, added reference to flow control. The point is that (some) receivers will over-commit, and will need to mitigate the attack. This will require some kind of heuristic. I proposed one -- counting holes, and if they are not commensurate with the packet loss rate abort the connection. If you believe there is something smarter to do, please chime in.

@Aron-Schats
Copy link

Aron-Schats commented Apr 20, 2017 via email

@huitema
Copy link
Contributor Author

huitema commented Apr 21, 2017

Aaron: "why would you have to commit more than the connection flow control window?" It happens if the sum of the per stream windows is larger than the congestion window. For example, when the endpoint cannot predict which of the streams the other endpoint will fill first.

Since there is no "one size fits all" mitigation, simplify the recommendations. The point is to draw attention to the problem, and trust developers to do the right thing.
@huitema
Copy link
Contributor Author

huitema commented Apr 21, 2017

Modified the mitigations part. Martin, I think that the new text addresses your review. Can you give it a look? Thanks.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks fine. I'll give others a chance to poke at this a little before merging.

The attack is mitigated if flow control windows correspond to
available memory. However, some receivers will over-commit memory and advertise
flow control offsets in the aggregate that exceed actual available memory.
The over-commitment strategy may leads to better performance when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"may leads" -> "can lead"

the stream fragmentation attack.

QUIC deployments SHOULD provide mitigations against the stream fragmentation
attack. Mitigations MAY consist of avoiding over-committing memory, delaying
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use "MAY" here, it's not permissive. "could" is fine.


QUIC deployments SHOULD provide mitigations against the stream fragmentation
attack. Mitigations MAY consist of avoiding over-committing memory, delaying
reassembly of STREAM DATA frames, implementing heuristics based on the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"STREAM frames"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I believe I fixed all that...

@martinthomson martinthomson merged commit b47c030 into quicwg:master Apr 24, 2017
@martinthomson martinthomson mentioned this pull request Apr 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants