Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/flate: implement partial flush #31514

Open
hanwen opened this issue Apr 17, 2019 · 16 comments
Open

compress/flate: implement partial flush #31514

hanwen opened this issue Apr 17, 2019 · 16 comments

Comments

@hanwen
Copy link
Contributor

@hanwen hanwen commented Apr 17, 2019

What version of Go are you using (go version)?

go-1.11.5

while trying to implement zlib compression for SSH (by internal Google request,
#31369), I stumbled on lack of support for partial flush.

partial flush is described more precisely here: https://www.bolet.org/~pornin/deflate-flush.html (look for Z_PARTIAL_FLUSH). While this is deprecated functionality, it has also made its way into the SSH RFC.

For the write side, it looks like I can't truncate the sync marker from a full flush to get the same effect.

For the read side, feeding packets from OpenSSH into zlib.Reader leads to errors ("unexpected EOF").

@bradfitz bradfitz changed the title zlib: implement partial flush compress/flate: implement partial flush Apr 17, 2019
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Apr 17, 2019

I assume this is in the flate package where you need this? At least initially?

/cc @nigeltao

@hanwen

This comment has been minimized.

Copy link
Contributor Author

@hanwen hanwen commented Apr 17, 2019

From what I could tell, the real work is in the flate package, but then the zlib package should also also expose it, because SSH produces and consumes the zlib version header.

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Apr 17, 2019

What about the reader side of things? Wouldn't a partial flush on the writer only be useful if the reader had API to synchronize on these specialized flushing modes?

I should note that this is more difficult on the reader side for API design, since the reader lacks an exported concrete type.

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Apr 17, 2019

Also, my reading of partial flush suggests that partial flush is a subset of sync flush, which the writer already supports.

@hanwen

This comment has been minimized.

Copy link
Contributor Author

@hanwen hanwen commented Apr 18, 2019

I've seen that the bytes already get decompressed correctly (so it kind of works), but then the reader wants to read on and finds EOF. The reader should know to stop further reads when it encounters a partial flush.

For API: when the reader encounters partial flush, it could just return the bytes thus far read without error. In SSH then, the caller could use bytes.Buffer{} as input to the reader, and stop requesting further zlib bytes when the buffer is empty.

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Apr 18, 2019

@hanwen, is there internal framing within the SSH protocol such that it can distinguish between a return from Read due to a partial flush, as a opposed a return due to other reasons (i.e., full buffer)?

@hanwen

This comment has been minimized.

Copy link
Contributor Author

@hanwen hanwen commented Apr 18, 2019

SSH protocol has packets, that are up to 32kb in size. The entire packet payload is compressed, so one would read until the entire packet was processed.

I think there is could be chunking going on, so that you may need to do multiple reads to process a single packet.

@dsnet

This comment has been minimized.

Copy link
Member

@dsnet dsnet commented Apr 19, 2019

The entire packet payload is compressed, so one would read until the entire packet was processed.

I'm not sure if that answered my question. More specifically: how does SSH handle framing for the packet? For the un-compressed data does it have a length-prefix or a delimiter marking the length or end of the packet? Or does it rely on a partial flush in DEFLATE to determine the end of a packet?

If the former, all that's required in compress/flate is to simply return from Read every time the block ends. If the later, we need additional API in compress/flate to synchronize on the different ways a DEFLATE block can end.

@hanwen

This comment has been minimized.

Copy link
Contributor Author

@hanwen hanwen commented Apr 22, 2019

there is no size prefix or delimiter. On the read side, you just keep reading until there are no compressed bytes left to process. If the read API would be changed to be

// Read up to buf bytes, or up to the next partial or full flush encountered in the compressed input
// stream
Read(buf []byte)

then you could do

  // init
  buf := bytes.NewBuffer(compressed)
  reader := zlib.Reader(buf)
 ..


// handle packet:
  buf.Write(compressedPacket)
  var uncompressed []byte
  for buf.Len() > 0 {
      var out [10240]byte 
      n, err := reader.Read(out[:])
      // err handling
      uncompressed = append(uncompressed, out[:n]...)
  }

this might break current users that expect to be able to read a certain amount of data in a single Read call, though.

There is packet framing, but it is around the compressed payload: the compressed data is padded, encrypted, mac'd and prefixed with total length for the encrypted packet.

@branelmoro

This comment has been minimized.

Copy link

@branelmoro branelmoro commented Jul 19, 2019

@hanwen if we put limit on number of bytes read from underlying Reader and return error if maximum read limit is reached, then I think this will solve problem of reading till EOF.

We will need new interface for decompressor:

type Limiter interface {
    // Sets limit on number of bytes read from underlying Reader
    SetReadLimit(n int)
}

Also, we need to add new property to store limit on number of bytes read from underlying Reader.

type decompressor struct {
    ...
    l int
    ...
    ...
}

if l <= 0 (default 0), means that no read limit is set
if l > 0, means that maximum l number of bytes can be read from underlying Reader and it will return ReadLimitReached error if l number of bytes and already read and further read is done

Add f.l = -1 as default value in below two functions:

func NewReader(r io.Reader) io.ReadCloser {
    ...
    f.l = -1
    ...
    ...
}

func NewReaderDict(r io.Reader, dict []byte) io.ReadCloser {
    ...
    f.l = -1
    ...
    ...
}

We will need two more functions for decompressor:

func (f *decompressor) SetReadLimit(n int) {
    f.l = n
}

func (f *decompressor) readByte() (byte, error) {
    if f.l == 0 {
        return 0, errors.New("ReadLimitReached")
    }
    b, e := f.r.ReadByte()
    if e == nil && f.l != -1{
        f.l -= 1
    }
    return b, e
}

And finally, we need to replace f.r.ReadByte() with f.readByte()

Then in application, you can do

import "compress/flate"

decompressor := flate.NewReader(compressed_buffer)
size := 10 // or size := compressed_buffer.Len() or some known size of partial flush of compressed_buffer
decompressor.(flate.Limiter).SetReadLimit(size)
_, err = io.Copy(decompressed_buffer, decompressor)

Above code will return ReadLimitReached error after reading 10 bytes but it will not read till EOF
Now you can use decompressor further by setting limit again or by resetting underlying Reader

This will not break current users

@branelmoro

This comment has been minimized.

Copy link

@branelmoro branelmoro commented Jul 19, 2019

@hanwen I have created pull request for this - #33218
I hope this works...

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Jul 22, 2019

Change https://golang.org/cl/187018 mentions this issue: Fixes #31514 - implement partial flush for decompressor

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 3, 2019

Change https://golang.org/cl/198757 mentions this issue: compress/flate : add a check to recognize partial flush while decompressing

@nhooyr

This comment has been minimized.

Copy link
Contributor

@nhooyr nhooyr commented Dec 30, 2019

The docs on the synchronization example at https://golang.org/pkg/compress/flate/ need to be updated.

They state:

			// Read the message length.
			// This is guaranteed to return for every corresponding
			// Flush and Close on the transmitter side.

But that is not true for Flush on a writer without pending data. compress/flate will write the 4 byte sync marker but the read side will never return.

See

if len(f.toRead) > 0 {

The loop only returns if there are additional bytes in the buffer to be read.

@nhooyr

This comment has been minimized.

Copy link
Contributor

@nhooyr nhooyr commented Dec 31, 2019

In order to implement context takeover compression for WebSockets, I was able to get around the above issue by using panic in the underlying io.Reader after the sync marker is read and then recovering. The reader remains useable after this with the correct sliding window.

Janky but works for now.

@nigeltao

This comment has been minimized.

Copy link
Contributor

@nigeltao nigeltao commented Jan 20, 2020

I think it's @dsnet's call about what to do re @nhooyr's latest comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.