-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP3 / QUIC support #4723
Comments
|
Thanks, @rbtcollins. I agree that this would be a cool feature; but I'm less convinced that this will add much operational value, especially in relation to the cost of implementing this (which includes much more than the proxy changes you referenced).
Presumably this is all in the service of latency? Does QUIC have a concrete benefit within a cluster where inter-pod connections are long-lived and multiplex an arbitrary number of streams? The connection handshake cost should be amortized in these cases. What type of concrete savings might we expect? I think QUIC makes a lot more sense in the use case I understand it to be designed for: when we consider interactions with many clients that have short-lived connections... which is basically an Ingress problem. I think we could get a much simpler plan for supporting HTTP/3 at the ingress level; but I'm not convinced that HTTP/3 offers a whole lot of value within the service mesh. |
|
tl;dr:
All of the above very relevant in a datacentre, and given a service mesh can easily extend cross datacentre (e.g. two AZ's in Azure are two distinct datacentres), definitely relevant within a service mesh. --- longer answer --- Hi @olix0r I think there are a few things to consider here. Firstly though, the work - yes, the CNI layer needs tweaking; but no, arbitrary UDP handling is not needed, since QUIC has a well defined port - 443. QUIC v1 uses TLS, though other crypto types could be defined in future. I detailed that in the context of an mtls service mesh in my bug report - so yes. Linkerd supports inter-cluster scenarios as well as intra-cluster scenarios, so I'm not sure why you would like to exclude large meshes from consideration. There is no handshake cost to amortise : QUIC has 1-RTT handshakes. Where H2 prevents HOL blocking, QUIC prevents window stalling and retransmission when packet loss occurs. QUIC is designed for:
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46403.pdf is a good read on QUIC pre-IETF involvement; since then a lot has happened and https://blog.cloudflare.com/http-3-from-root-to-tip/ is another good read. |
|
I don't mean to insinuate that QUIC isn't valuable. We're definitely open to the feature, but not at the expense of the other things we need to be doing. I have to make a decision between working on this and the other things on our roadmap; and this strikes me as a lot of work for something that will only benefit the most specialized, bleeding edge applications. There's other work we can be doing that will have a bigger impact in a shorter time frame. However, if this is something you're passionate about, we'd be happy to help you (or anyone reading this) through the process of contributing to the project. We'd probably want to start with an RFC. I'm especially keen to get a better understanding of how we could support QUIC transparently without weakening the security of applications by having to man-in-the-middle TLS. Thanks for taking the time to write up this issue @rbtcollins. It's appreciated. |
|
Ok, cool - I do appreciate that triage is an important thing; I'm trying to be clear about the ways in which H3 is useful even within high speed cloud environments. For context - we're in the process of figuring out which of the plethora of service meshes options to spend time really investigating, because (highlander voice) there can only be one. This, SPIFFE are multi-cluster are three things I think are going to be particularly important for us over the next 6-18 months, due to the way our applications are architected and operate. I am on slack if you'd like to chat more directly about anything here. In the short term though, our goal is to make a choice about where to invest, rather than do the investment - so I think writing an RFC would be premature; though it is certainly something I can imagine our infra team taking on down the road if we pick linkerd. |
|
@rbtcollins Thanks! This is helpful context. One question that might help drastically simplify this problem: how important is it that your application actually speak HTTP/3 on the pod-local communication with the proxy. I.e. would it be suitable for the proxy to use HTTP/3 between pods but to rely on HTTP/2 for the pod-local communication? I think this could likely give you the reliability improvements you're seeking without us having to incur the complications of being a transparent quic proxy. Would that work for you? |
|
Packet loss on loopback traffic is going to be exceedingly rare, so that configuration would work well I suspect. |
|
Just noting that it sounds like the Hyper maintainers have been looking at adding HTTP/3 support. When this happens, this integration should get much easier. |
|
I agree on the fact that H3 support is a lot of work and will benefit only the few of us. In my case for example I'd like to be able to have metrics and routing control over UDP packets in my mesh. |
|
There's some progress on hyper's quic integration here. So this will become more feasible over the next few months.
I'm a bit skeptical about the value of UDP support without application/session protocol awareness. Most UDP protocols (like QUIC or SIP) implement their own state on top of UDP datagrams. So, we can't blindly load balance or route UDP datagrams without breaking the application. And on the metrics side of things, we could only tell you the size distribution and number of UDP datagrams sent between endpoints and nothing else -- and I believe you can basically get all of this information from the Linux kernel (via tools like |
|
@olix0r thanks for the reply.
All protocol that use a state on top of UDP i agree with you that needs full proto support |
|
My use case for UDP:
I would be content with basic UDP support, where only proxying is supported, but load balancing and session management is not. My UDP use cases are all point-to-point, where I don't need to load balance or have session support at the protocol level, and I'm multiplexing multiple services over a single connection (usually). A baseline of proxy-only support is more than enough for my use cases for quite a while. I don't think advanced support for HTTP/3 would be required upfront if load balancing is handled based off client IP and port for consistency. HTTP/3 is a complex beast, I can completely understand the skepticism!
That's fine for me! |
|
My company recently start to experiment with Quic as well. In case we do decide that Quic provides concrete benefits for us, we will face a significant decision here. |
|
Are there any update on this feature request? It seems that H3/Quic are unlikely to be supported by Linkerd in near term? |
|
We've kicked this around among the maintainers, and ultimately we think that now is not the time for H/3 in Linkerd. It's very clear that HTTP/3 is of great benefit for clients outside the cluster when talking to workloads in the cluster, but while there's been some experimentation with HTTP/3 inside the cluster, so far no one seems to actually have a clear use case. If we're wrong and someone here definitely does have clear use case, it would be lovely to hear from you!! 🙂 For now, we're going to close this one – reopen if necessary!! |
What problem are you trying to solve?
HTTP/3 support is currently lacking from linkerd2.
HTTP/3 is defined here - https://datatracker.ietf.org/doc/draft-ietf-quic-http/?include_text=1 - and while not fully ratified the bones of it are very clear: it is a UDP protocol, built on QUIC.
QUIC is defined here https://datatracker.ietf.org/doc/draft-ietf-quic-transport/?include_text=1
#4023 which has been closed sat idle because TLS by default is involved, and #3190 is tracking that work.
How should the problem be solved?
There now seems to be a decent answer for the plaintext-to-tls use cases, but not for the tls-interception cases, which is required for QUIC interception: that strictly requires cert cooperation between every componenent, and thats before we talk about external metrics - which because of complexities like certificate pinning really should be kept entirely separate.
I'd like to see linkerd2 use https://github.com/cloudflare/quiche or one of the other rust h3 libraries to handle h3 traffic, provide metrics on it, load balancing across endpoints etc.
Any alternatives you've considered?
Cross fingers and hope every client is fully featured and robust against everything that goes wrong in k8s including endpoint hotspotting etc etc?
How would users interact with this feature?
Since h3 does not require client certificates, this only requires that services have valid certs: mTLS is already a superset of this capability, and as long as we can ensure that the local linkerd2 process can identify as the endpoint the client is connecting to, no pod configuration changes should be needed.
Similarly we already have the service DNS name; so basically there should be no user interaction at all, except perhaps to toggle a feature flag to enable this in their linkerd environment.
The text was updated successfully, but these errors were encountered: