-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: x/net/http2: provide hook for custom frame policers #63518
Comments
While I am not against this proposal, I do not consider it sufficient because it requires per app changes to mitigate potential http2 CVEs. The default behavior of the system should prevent abuse. For example, the fix for CVE-2023-44487/CVE-2023-39325 should have 'fixed' the problem for Kube, especially since Kube lets an end user configure |
My take for this CVE is that without changes to the standards, there will always be some set of apps requiring specific changes because the way it must be mitigated is somewhat coarse and will have cases where it's either too aggressive or not aggressive enough. That means that no matter what, some new API surface area will need to be exposed, and ideally that surface would be flexible enough to not need changes the next time an http2 vulnerability rolls around (ie. it would be ideal to not expose just a 'maxRsts' config or similar). Whether the proposal is this API with a default policer or this API with no default policer is something I don't have a strong opinion on. |
What's the next steps here? Does the Go team agree with this assessment and is there further follow up work |
Thanks @dims I saw that PR - I was curious if there were going to be further changes to the go stdlib |
Hi all, I am wondering what is the diff of the current Golang fix compared to the design behind haproxy's solution. |
@skonto There are roughly three common fixes that have been deployed for this issue. From least to most restrictive:
Number 3 isn't a replacement for doing 1 or 2, but it's a valuable addition for a few reasons:
|
Hi @elindsey thanks for the insights. So by looking at the haproxy comments it seems that the max per connection is fixed to the number of max streams that can be processed:
So this is 1 in your description and applies per connection. It seems that the multiple clients problem with using the max number of streams available, is not discussed in that fix but later on if I understand correctly the effect is not considerable if a connection uses the max=100. |
The proposal is to add a hook for HTTP/2 server connections:
The use case is to permit users to provide their own fine-grained rate limiting policy on HTTP/2 connections. I understand the desire here. However, the proposed API is a very low-level hook into the internals of the HTTP/2 server. This is the sort of feature we have generally avoided providing, and we have often come to regret when we do provide. For example, I think it's clear at this point that A point in this proposal's favor is that, unlike On the other hand, it seems to me inevitable that someone is going to want the contents of each frame, in which case this hook no longer suffices. I personally would much prefer to provide reasonable default behavior and higher-level APIs than this one.
In particular, if there are good mitigations of this nature that can be applied, I'd be much happier about implementing them directly in the HTTP/2 server so that all users can take advantage of them. That said, I do see the argument that providing the tools to build mitigations permits more experimentation in user code when there isn't an obvious right approach. |
@neild what would you recommend that folks building Go servers that run directly on the public internet do to harden against http2 issues? I was able to do kubernetes/kubernetes#121120 for a subset of clients, but kubernetes/kubernetes#121197 remains open with no good solution. |
Ideally, if there's additional hardening we can apply, we'd do so in kubernetes/kubernetes#121197 doesn't have much information about what specific measures you want to take. Can you go into more detail? |
Unfortunately I haven't yet seen a scheme that doesn't require both exposing a configuration knob and exposing frame-level metrics in order to configure correctly, and pretty quickly end up with a lowish level hook. Tomcat is perhaps the most interesting to dig into. They tried to make a more "hands off" policer that tracks the number of good frames and the number of frames it deems 'overhead' (essentially things that aren't requests). Theoretically this would be less brittle than a simpler rate limiter and require less tuning, but anecdotally this has been difficult to use in practice. Tuning it requires understanding both the client traffic patterns and Tomcat's internal algorithm so that the rates can be translated into Tomcat config. It hasn't been entirely robust to new attack types, requiring successive addition of specific configs (eg. overHeadResetFactor, overheadContinuationThreshold), and has grown to five separate tunables. And perhaps most relevant, tuning it at all still requires frame-level metrics, so needs some form of lower-level APIs if you want to avoid requiring users to rely on decrypted pcaps. Netty, proxygen, and similar proxy toolkits had an easier time because it was more natural to expose lower-level APIs, or they already had APIs in place that let users implement mitigations. My experience with a similar Netty rate limiter and seeing that it was robust in the face of newer attacks (RST, CONTINUATION, etc.) is what originally led to the ConnectionCalmer() patch. It's definitely a less natural fit for go's net APIs, but I haven't seen a good or simpler alternative yet (very happy to be wrong). |
I want to be able to make it so that the cost (CPU/RAM/etc) for a single authenticated client to attempt to DOS the Kube API server is roughly 1:1 between the client and server. Today, a single authenticated client using many TCP connections with many HTTP2 streams can disproportionately cause the Kube API server to perform a high amount of work while only using a limited amount of resources on the client. I am not looking to address concerns related to DDOS. |
This is a followup to CVE-2023-44487/CVE-2023-39325.
The fix shipped is a very welcome change and nicely caps the number of handlers in the system. However, it's still insufficient in certain cases, for example kube and some of our own services. It is difficult to impossible in practice to tweak MAX_CONCURRENT_STREAMS below 100 without hitting client compatibility issues, and even at 100 the rapid reset attack may incur significant resource use, especially in cases where the go server is a gateway to other heavyweight services.
A wide number of mitigations have been deployed in other stacks; a non-exhaustive summary:
The thing all these mitigations have in common is visibility into the types of frames flowing over a connection. Additionally, this level of visibility is necessary to produce metrics and get insights into what attacks are even being seen, something that is not possible with the current APIs.
Proposal is to add a single callback hook that would receive basic frame information scoped to the individual connection: frame type, frame length, and stream ID. The callback's return value may trigger termination of the connection. With this hook, all necessary metrics can be gathered and all mentioned variants of frame policers may be built. The default is unset and will have no changes to out of the box behavior.
Attached is a patch that we deployed to build our mitigation for the CVE -
0001-introducing-ConnectionCalmer.patch
The text was updated successfully, but these errors were encountered: