Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto: add scatter/gather API for serialization #609

Open
MakMukhi opened this issue May 16, 2018 · 13 comments
Open

proto: add scatter/gather API for serialization #609

MakMukhi opened this issue May 16, 2018 · 13 comments

Comments

@MakMukhi
Copy link

Serialized data read from the wire often spans several data frames. The current proto.Unmarshal API requires that data from all these frames be copied to a slice.
This extra copying(and memory allocation) can be avoided if we have an unmarshal gather which can deserialize data from multiple byte slices.

If there's an agreement on the proposed API, I'd happy to work on it's implementation as well.

@dsnet
Copy link
Member

dsnet commented May 17, 2018

\cc @randall77 @LMMilewski

@dsnet dsnet changed the title Proposal: proto.Unmarshalv(pb Message, b ...[]byte) proposal: proto: add Unmarshalv(pb Message, b ...[]byte) May 17, 2018
@randall77
Copy link

If we're going to go this route we should do UnmarshalStream(pb Message, r io.ReadSeeker). Then we could use a seeker version of io.MultiReader to combine byte streams.

(We need seeker to allow multi-pass algorithms.)

@MakMukhi
Copy link
Author

I'm intentionally trying to stay away from the io.Reader API, since every read operation requires a buffer to be filled in by copying into it( Read(b []byte)).

My current idea is to take in this slice of slices and create a wrapper around it which provides slice like functionality that the implementation looks for.

For instance, decodeVarInt looks to read b[n], our wrapper data structure can have a method for that ReadByteAt(loc int) (byte, error).

This way we make minimal changes to the current implementation.

@dsnet
Copy link
Member

dsnet commented May 17, 2018

I should mention that the net package uses an io.Reader along with type assertions to achieve this behavior. See golang/go#17607 and net.Buffers.

@MakMukhi
Copy link
Author

That does look promising. We can leverage net.Buffers by giving its WriteTo method a custom io.Writer that returns the slice written on it to us. This does require changing the implementation a little more but not substantially.
However, this makes me think of another solution which is to have our data structure just return the entire current slice and move to the next one. The implementation changes will be exactly the same as with net.Buffer without the added complexity.

@MakMukhi
Copy link
Author

Here's the wrapper data structure that I think can be hooked in the current implementation with very little friction:

package proto

import (
	"io"
	"unsafe"
)

// byteSlice wraps a slice of byte slices
// to provide slice like functions on it.
type byteSlice struct {
	ofst int
	bufs [][]byte
	size int
}

func newByteSlice(bufs ...[]byte) *byteSlice {
	bs := &byteSlice{
		bufs: bufs,
	}
	for _, b := range bufs {
		bs.size += len(b)
	}
	return bs
}

// peekByteAt reads a byte at loc. It returns an error
// if there's no more data. It doesn't move the ofst
func (s *byteSlice) peekByteAt(loc int) (byte, error) {
	if s.ofst+loc > s.size {
		return 0, io.ErrUnexpectedEOF
	}
	for _, b := range s.bufs {
		if len(b) > loc {
			return b[loc], nil
		}
		loc -= len(b)
		continue
	}
	panic("Incorrect implementation!")
}

// moveBy moves ofst by n.
func (s *byteSlice) moveBy(n int) error {
	if s.ofst+n > s.size {
		return io.ErrUnexpectedEOF
	}
	s.ofst += n
	for n > 0 {
		if len(s.bufs[0]) > n {
			s.bufs[0] = s.bufs[0][n:]
			return nil
		}
		n -= len(s.bufs[0])
		s.bufs = s.bufs[1:]
	}
	return nil
}

// Note: this allocates new memory and should be only
// used when assigning a byte slice to a proto.Message
// field.
func (s *byteSlice) readn(n int) ([]byte, error) {
	// We want a non-nil value returned when n is 0.
	if n == 0 {
		return emptyBuf[:], nil
	}
	if s.ofst+n > s.size {
		return nil, io.ErrUnexpectedEOF
	}
	// TODO(mmukhi): Evaluate variable usage to reduce
	// cache misses.
	s.ofst += n
	var p []byte
	for n > 0 {
		if len(s.bufs[0]) == 0 {
			s.bufs = s.bufs[1:]
			continue
		}
		sz := n
		if sz > len(s.bufs[0]) {
			sz = len(s.bufs[0])
		}
		// The use of append here is a trick which avoids the zeroing
		// that would be required if we used a make/copy pair.
		p = append(p, s.bufs[0][:sz]...)
		s.bufs[0] = s.bufs[0][sz:]
		n -= sz
	}
	return p, nil
}

// Note: this allocates new memory and should be only
// used when assigning a string to a proto.Message
// field.
func (s *byteSlice) readString(n int) (string, error) {
	b, err := s.readn(n)
	if err != nil {
		return "", err
	}
	// This trick is to prevent extra memory allocation made by
	// casting a byte slice to string.
	return *(*string)((unsafe.Pointer)(&b)), nil
}

func (s *byteSlice) length() int {
	return s.size - s.ofst
}

func (s *byteSlice) isEmpty() bool {
	return s.ofst >= s.size
}

@LMMilewski
Copy link
Member

@MakMukhi would you know how much this change would improve grpc benchmarks?

@MakMukhi
Copy link
Author

This will cut down memory foot-print for each RPC call(request-response) by half.

@gerben-s
Copy link

I think this is needed in go. And if you implement this look at the new c++ parser loop. The new parser is quite a bit faster and becomes very flexible due to resumability. In go this can probably be easily and elegantly implemented with go-routines, removing the necessity of a side stack.

@dsnet
Copy link
Member

dsnet commented Sep 10, 2018

goroutines are certainly cheap, but not so cheap that you would want to spawn one for every unmarshal operation and deal with the synchronization.

In Go, the standard library only has support for writev-like functionality, but readv support is still open (see #17607). Personally, I am hesitant to see this functionality in protobufs unless it is symmetrical (supports both marshal and unmarshal). Furthermore, I kind of want to see how #17607 plays out with regards to readv first as I have a suspicion that will have some affect on the API design here.

@nomadli
Copy link

nomadli commented Dec 19, 2018

I think supporting io.Reader io.Writer API is needed,the tensorflow module size is about 1-2G, so i need 5G memory to load the module。

@dfawley
Copy link
Contributor

dfawley commented Dec 20, 2018

I'm intentionally trying to stay away from the io.Reader API, since every read operation requires a buffer to be filled in by copying into it( Read(b []byte)).

Looking back on this -- I think it's possible Reader/Writer can be used efficiently if done the right way, even though we initially believed they could not. To me, the intuitive way would be an API like this:

func UnmarshalFrom(io.Reader, proto.Message) error

However, this results in an extra, unnecessary copy as the Reader implementation copies its data into the buffer passed to it for the protobuf library to consume.

If we were to flip that around, we could avoid that extra copy. Something like:

type Unmarshaller struct {}

func (u *Unmarshaller) Write([]byte) (int, error) {}
func (u *Unmarshaller) Message() (Message, error) {} // or maybe "Close"
   // and the Unmarshaller wraps a proto.Message which is valid after
   // Close returns

Here the implementation of Write would consume the bytes, only copying to maintain partial state dependent on subsequent data. This means the provider of the data could pass its buffer to the protobuf library, and no additional copy would be required.

Vice-versa, a marshaller could provide an io.Reader and the proto library would serialize directly into the user's buffer(s), avoiding an unnecessary copy on the way out.

@dsnet - what do you think?

@yetingsky
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants