Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API design: should depacketizers provide data as impl Buf or Bytes/Vec<u8>? #4

Closed
scottlamb opened this issue Jun 9, 2021 · 0 comments

Comments

@scottlamb
Copy link
Owner

scottlamb commented Jun 9, 2021

Right now retina::codec::VideoFrame implements Buf and supports only H.264. It always has two chunks: an AVC length prefix and the header+body of the (single) picture NAL (note I'll need to extend it for multiple NALs). The idea was to support zero-copy, but I think right now it's kind of the worst of all worlds:

  • if the NAL is fragmented (most of the bytes and maybe even most of the NALs on a typical stream), it copies it all into a BytesMut with a guesstimated size. It might copy again to grow partway through, and then probably (I haven't looked at profiles) copies it again on BytesMut::freeze due to tokio-rs/bytes#401.
  • if the caller wants a single contiguous buffer, they'll end up copying it themselves. (Maybe with bytes::Buf::copy_to_bytes, maybe not.)
  • you can only iterate through it once, which might be a problem for some usages.

AudioFrame and MessageFrame provide a Bytes directly; there's no good reason all three frame bytes shouldn't present their data in the same way.

I'd prefer to follow one of two paths. I haven't made up my mind as to which:

  • truly support zero-copy via data(&self) -> impl Buf<'self>. VideoFrame needs a Vec of chunks. If you want to iterate multiple times, you can just call data as many times as you want. You can use Buf::chunks_vectored + std::io::Write::write_vectored / tokio::io::AsyncWriteExt::write_vectored.
  • put everything into a single buffer. Accumulate Bytes in a reused Vec in Depacketizer::push then concatenate them during Depacketizer::pull when the total size is known, avoiding the extra copies mentioned above.

Arguments in favor of zero-copy (custom Buf implementation with multiple chunks):

  • People like zero-copy; it's generally assumed to be more efficient (but see below).
  • One neat API trick this would allow (for H.264) is selecting the Annex B encoding (00 00 00 01 between NALs) or the AVC encoding (length prefix between NALs) via something like h264(&self) -> Option<&H264Frame> then data_as_annexb(&self) -> impl Buf or data_as_avc(&self, len_prefix_size: usize) -> impl Buf. With the single-buffer approach I'd probably make folks choose when setting up the depacketizer instead. (I guess it'd also be fairly efficient to have a &mut accessor on the VideoFrame which does a one-way conversion from 4-byte AVC to Annex B, but that's a weird API.) I can imagine someone doing some fan-out thing where they actually want both encodings.

Arguments in favor of a single Bytes or Vec<u8>:

  • More convenient / simpler to get right IMHO. The zero-copy APIs seem half-baked/finicky, eg tokio's tokio::io::AsyncWriteExt::write_buf just writes the first chunk, and there's no write_all_vectored in either std::io::Write or tokio::io::AsyncWriteExt.
  • It might actually be more efficient. The answer isn't obvious to me. With my IP cameras, the IDR frames can be half a megabyte, fragmented across hundreds of RTP packets of 1400ish bytes. Just the &[Bytes] is then tens of kilobytes (four pointers per element). Far too big for the no-alloc path of SmallVec<[Bytes; N]>. And if someone's doing a writev call later, they have to set up/iterate through hundreds of IoSlices to write the whole thing at once. (And you can't even reuse a Vec<IoSlice> between iterations without trickery because it expects to operate on a mutable slice of initialized IoSlice objects.) It wouldn't surprise me if zero-copy is actually slower.
  • I'm not sure but I think Bytes is mostly just a tokio thing. async std folks might use just Vec<u8> or something instead. Although I'll likely keep using Bytes internally anyway to keep the individual packets around between push and pull.

Right now I'm leaning toward single Bytes. I might try benchmarking both but if the performance is close I think simplicity should win.

@scottlamb scottlamb changed the title API design: should depacketizers provide data as impl Buf or Bytes? API design: should depacketizers provide data as impl Buf or Bytes/Vec<u8>? Jun 9, 2021
@scottlamb scottlamb mentioned this issue Aug 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant