Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: add Framer.WriteDataN method #66655

Open
dfawley opened this issue Apr 2, 2024 · 22 comments
Open

x/net/http2: add Framer.WriteDataN method #66655

dfawley opened this issue Apr 2, 2024 · 22 comments

Comments

@dfawley
Copy link

dfawley commented Apr 2, 2024

Proposal Details

To write a data frame, the x/net/http2.Framer requires the data contents to first be accumulated into a single contiguous buffer. However, this is not always possible without requiring a copy. See this real-life example where a copy is required in order to construct the frame's contents before writing it. If we had an API like the following, this copy could be avoided:

func (f *Framer) WriteDataN(streamID uint32, endStream bool, data [][]byte) error

The implementation could be as simple as changing the append here into an append that loops over the slice of buffers instead of just the one buffer.

The gRPC code could then be changed to remove the copy and pass the 5-byte header buffer and the data buffer directly to the new method.

I'd also love to discuss other changes that could make the HTTP/2 Framer even more zero-copy friendly by transferring buffer ownership instead of using io.Reader/io.Writer, but the proposal above would be a simple step that provides real incremental performance.

cc @PapaCharlie, @atollena

@gopherbot gopherbot added this to the Proposal milestone Apr 2, 2024
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Apr 3, 2024
@ianlancetaylor
Copy link
Member

CC @neild

@PapaCharlie
Copy link

I think this would need slight tweaks. Namely, from the docs of WriteData:

It is the caller's responsibility not to violate the maximum frame size

In this case, the new WriteDataN should read as many bytes as it can from the given [][]byte up to the frame size limit, and return the number of bytes sent. The caller can then remove the necessary []byte slices from the data slice to reflect what remains to be sent, and invoke WriteDataN again. Alternatively, WriteDataN could manage the frames itself and simply chunk up the incoming data accordingly. Either way works.

@neild
Copy link
Contributor

neild commented Apr 3, 2024

I think my first question here is whether the better approach is for gRPC to copy http2.Framer and evolve it independently.

As you point out, the Framer interface has performance limitations from routing all data through an io.Reader/io.Writer. The framer is also not doing much that's particularly complicated or interesting; HTTP/2 frames aren't all that complicated. Rather than taking a dependency on x/net/http2 just for the framer, using something purpose-built for gRPC's needs would be one less dependency and avoid the need to coordinate changes to it.

@dfawley
Copy link
Author

dfawley commented Apr 3, 2024

In this case, the new WriteDataN should read as many bytes as it can from the given [][]byte up to the frame size limit, and return the number of bytes sent. The caller can then remove the necessary []byte slices from the data slice to reflect what remains to be sent, and invoke WriteDataN again. Alternatively, WriteDataN could manage the frames itself and simply chunk up the incoming data accordingly. Either way works.

Either of these would be nice, but not mandatory IMO. We can own the logic to pass the appropriate slices and maintain the size constraints ourselves. (I'm not even sure if the framer is currently aware of the max frame size, so this might be required anyway.)

I think my first question here is whether the better approach is for gRPC to copy http2.Framer and evolve it independently.

That's not an unreasonable option. On the other hand, if there are bugs or vulnerabilities discovered, it's nice when code is centralized to also centralize the fixes to those problems. Similarly, if a more performant methodology could be found and agreed upon, then those benefits can be shared by everyone using a standard(ish) library.

@neild
Copy link
Contributor

neild commented Apr 5, 2024

On the other hand, if there are bugs or vulnerabilities discovered, it's nice when code is centralized to also centralize the fixes to those problems. Similarly, if a more performant methodology could be found and agreed upon, then those benefits can be shared by everyone using a standard(ish) library.

By that argument, gRPC should be using the entire HTTP/2 implementation from x/net/http2, not just the framer. Perhaps that would have been a good idea, but that ship sailed long ago.

The proposal here (WriteDataN) is fairly limited, but as you point out, Framer's use of io.Reader/io.Writer is a fundamental limitation on performance. Fixing that will amount to a complete rewrite of the Framer API, and I think that points at why it makes sense for gRPC to have its own framer implementation--performance and API design are inextricably linked, and committing to an exported API necessarily limits future performance improvements.

@dfawley
Copy link
Author

dfawley commented Apr 5, 2024

By that argument, gRPC should be using the entire HTTP/2 implementation from x/net/http2, not just the framer. Perhaps that would have been a good idea, but that ship sailed long ago.

Brad tried that before I even joined the team. It's still in the code, and some users use it. I would love to live in that world, but alas, it's not this one. I'm not sure whether the net/http server supports all the things gRPC needs to support, like BDP estimation, configurable keepalive pings, max connection age, etc, but if it doesn't, I am guessing it would be even more painful to add those features than adding a simple, optional function to the framer.

The proposal here (WriteDataN) is fairly limited, but as you point out, Framer's use of io.Reader/io.Writer is a fundamental limitation on performance. Fixing that will amount to a complete rewrite of the Framer API, and I think that points at why it makes sense for gRPC to have its own framer implementation--performance and API design are inextricably linked, and committing to an exported API necessarily limits future performance improvements.

We don't have the resources to embark on a full framer rewrite at this time. So our choices are to fork it and add this simple method ourselves, or to add it to x/net/http2. I'd really rather not fork the whole package for something so small. If you're saying that's the only path forward, we will consider it, but it feels wrong.

This is a very trivial extension of the existing API and would not impose any new limitations of future work.

@dfawley
Copy link
Author

dfawley commented Apr 12, 2024

Is there a path forward here? If I send a CL, would it be considered?

@neild
Copy link
Contributor

neild commented Apr 16, 2024

An alternate possibility is that you add this one function to your own implementation:

func WriteDataN(w io.Writer, streamID uint32, endStream bool, bufs [][]byte) error {
        size := 0
        for _, buf := range bufs {
                size += len(buf)
        }
        flags := byte(0)
        if endStream {
                flags |= 0x1
        }
        if _, err := w.Write([]byte{
                byte(size >> 16),
                byte(size >> 8),
                byte(size),
                0x00, // type
                flags,
                byte(streamID >> 24),
                byte(streamID >> 16),
                byte(streamID >> 8),
                byte(streamID),
        }); err != nil {
                return err
        }
        for _, buf := range bufs {
                if _, err := w.Write(buf); err != nil {
                        return err
                }
        }
        return nil
}

~20 lines, fairly straightfoward. (I did omit checking that the data size is < 2^24, so a few more lines for that.) This requires that you have access to the framer's io.Writer, and it makes multiple writes rather than a single one, so that Writer should be buffered. From a quick glance at gRPC's use of Framer, I think that you do have access to a buffered io.Writer at the point of writing data, but I might be missing something.

This will also be rather more efficient than adding a Framer method, since Framer buffers an entire frame before writing it. This is unnecessary when the underlying writer is buffered.

I might modify the function signature a bit, though, to avoid the need to build a [][]byte at all:

// WriteDataHeader writes the header of a DATA frame containing size bytes.
// The caller is responsible for writing size bytes of data after the header.
func WriteDataHeader(w io.Writer, streamID uint32, endStream bool, size int) error

Another question is whether it's really all that useful to combine these writes into one frame; the per-frame overhead is 8 bytes, which isn't nothing, but splitting one chunk of data across two frames doesn't seem all that expensive. I'll assume you've measured this, but if you haven't it might be worth thinking about.

@rsc rsc changed the title proposal: x/net/http2: Add Framer.WriteData variant that supports multiple data buffers proposal: x/net/http2: add Framer.WriteDataN method Apr 24, 2024
@rsc rsc moved this from Incoming to Active in Proposals Apr 24, 2024
@rsc
Copy link
Contributor

rsc commented Apr 24, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 8, 2024

@neild what do you think the right outcome here is?

@neild
Copy link
Contributor

neild commented May 8, 2024

I'd like to know whether the WriteDataN function above (#66655 (comment)), which requires no API modifications, is sufficient for gRPC's needs. If it is, then we don't need to make a change to the http2 package.

If we do make an API change, I'd prefer:

// WriteDataHeader writes the header of a DATA frame containing size bytes.
// The caller is responsible for writing size bytes of data after the header.
func (f *Framer) WriteDataHeader(streamID uint32, endStream bool, size int) error

This should be more adaptable and should perform better, since it avoids an unnecessary copy of the data.

@dfawley
Copy link
Author

dfawley commented May 8, 2024

The function proposed above (WriteDataN) is just implementing the framer logic outside of the framer. That doesn't make sense to me. Yes, it works, but so does not using this package at all.

I think the question that needs answering is: what is the purpose of this library?

  1. It's here for the stdlib http implementation to use. It happens to be exported because of a historical accident, but anything not directly required by the stdlib will not be implemented.

  2. It's a basic, functional, minimal implementation. Anyone looking for better performance should look elsewhere.

  3. It's a good implementation. It will be maintained and extended, within reason, to improve usability and provide the best performance possible.

  4. ??

The alternative proposed (WriteDataHeader) SGTM.

@neild
Copy link
Contributor

neild commented May 8, 2024

I don't have the historic context on how Framer ended up here. Perhaps @bradfitz can chime in if he's interested.

From my perspective:

  1. Framer is an exported API. It works, and we will fix bugs in it.
  2. Framer is not the API I would design if performance is a goal. It's fundamentally limited by the use of io.Writer and io.Reader. The ReadFrame method returning an interface means you need to do complicated caching stuff to avoid each read allocating. None of these problems can be fixed within the context of the existing API.
  3. HTTP/2 frames aren't all that complicated. Framer doesn't much that's exciting. (HPACK decoding is complicated, but that's a separate package.)
  4. If I was maintaining gRPC, I would drop the Framer dependency. It's a deep internal detail of the implementation, and I would not want to route changes to it through the Go proposal process.

Anyway, I'm really not very excited about adding new methods to Framer, but I'm okay with WriteDataHeader. It's simple, won't come with any significant maintenance burden, and the http2 package API is a complete mess already so it won't make things too much worse than they already are.

@dfawley
Copy link
Author

dfawley commented May 9, 2024

WriteDataHeader would be great.

I still think it would be good for future proposals if we had some kind of official answer to my question about the goals for the library. It seems like you're saying you'd prefer (1): "exported by accident; bug fixes only" but are willing to do (3): "extended within reason" as long as it's very minor. In the meantime, we will investigate stopping our use of it, but that would be a longer-term project.

I can send a CL for WriteDataHeader if you'd like. Let me know.

@rsc
Copy link
Contributor

rsc commented May 15, 2024

I agree that gRPC should probably copy this code and adapt it to their needs, but it's also fine to add the method.

@neild
Copy link
Contributor

neild commented May 15, 2024

@rsc: Is that a proposal committee decision? Or is this out-of-scope for proposal committee, being in x/?

@dfawley

I still think it would be good for future proposals if we had some kind of official answer to my question about the goals for the library.

"It's a basic, functional, minimal implementation."

Framer, as designed, can not "provide the best performance possible". The decision to route data through io.Reader and io.Writer precludes it. I suspect the representation of frames also imposes insurmountable limitations on performance, since they're allocation-heavy. If we want the highest possible performance, we need to redesign Framer entirely.

@rsc
Copy link
Contributor

rsc commented May 15, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to add:

// WriteDataHeader writes the header of a DATA frame containing size bytes.
// The caller is responsible for writing size bytes of data after the header.
func (f *Framer) WriteDataHeader(streamID uint32, endStream bool, size int) error

@dfawley
Copy link
Author

dfawley commented May 15, 2024

"It's a basic, functional, minimal implementation."

If that's the case, then you can close this request if you don't want it. The basic functionality is provided already. We have an intern starting soon who will look into removing our dependency on this package. That's very unfortunate from the standpoint of open source software and the implications of security vulnerabilities, but if it's your intention to maintain only the bare minimum of functionality here, with no consideration for performance, then that's clearly our only option.

@rsc
Copy link
Contributor

rsc commented May 23, 2024

if it's your intention to maintain only the bare minimum of functionality here, with no consideration for performance

This seems like an extreme statement to make. Clearly we do care about performance. The performance win of this specific API seems low though. Even so, it's likely to be accepted.

@rsc rsc moved this from Active to Likely Accept in Proposals May 23, 2024
@rsc
Copy link
Contributor

rsc commented May 23, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is to add:

// WriteDataHeader writes the header of a DATA frame containing size bytes.
// The caller is responsible for writing size bytes of data after the header.
func (f *Framer) WriteDataHeader(streamID uint32, endStream bool, size int) error

@dfawley
Copy link
Author

dfawley commented May 23, 2024

This seems like an extreme statement to make. Clearly we do care about performance.

Sorry, in light of all the pushback in this issue, I'm trying to get a handle on my question about how this package is intended to be maintained.

The performance win of this specific API seems low though. Even so, it's likely to be accepted.

The win admittedly hasn't been measured. I'd be happy to do that experiment if it's helpful for justifying the change. For RPC payloads under 16kb, we are currently copying every byte into a separate buffer solely to add the 5 byte gRPC header and then call this method. We can't otherwise do any better right now given the APIs we need to work with (mainly: protobuf and the framer).

@rsc
Copy link
Contributor

rsc commented May 30, 2024

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The proposal is to add:

// WriteDataHeader writes the header of a DATA frame containing size bytes.
// The caller is responsible for writing size bytes of data after the header.
func (f *Framer) WriteDataHeader(streamID uint32, endStream bool, size int) error

@rsc rsc moved this from Likely Accept to Accepted in Proposals May 30, 2024
@rsc rsc changed the title proposal: x/net/http2: add Framer.WriteDataN method x/net/http2: add Framer.WriteDataN method May 30, 2024
@rsc rsc modified the milestones: Proposal, Backlog May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests

6 participants