Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression is applied to empty messages, which makes them bigger, not smaller #6831

Closed
jroper opened this issue Dec 3, 2023 · 10 comments · Fixed by #6842
Closed

Compression is applied to empty messages, which makes them bigger, not smaller #6831

jroper opened this issue Dec 3, 2023 · 10 comments · Fixed by #6842
Assignees
Labels
Type: Feature New features or improvements in behavior

Comments

@jroper
Copy link
Contributor

jroper commented Dec 3, 2023

From my reading of the source code, it appears that compression is applied to empty messages. This makes no sense, an empty message is 0 bytes long, while a gzipped compressed empty message is at least 21 bytes long. Empty messages should never be compressed. It makes no sense to compress any message that is less than 21 bytes, since the compressed message will always end up being longer than the original.

Example of where compression is done without checking the message length first:

https://github.com/grpc/grpc-go/blob/1b05500d8019885575a37efa2e961e278d680ee9/server.go#L1138C19-L1138C28

jroper added a commit to jroper/grpc-go that referenced this issue Dec 3, 2023
Fixes grpc#6831.

gzip compressing messages less than 21 bytes is always unfruitful, it always
results in "compressed" messages that are longer than the original message
length, since the minimum length of a gzipped message is 21 bytes, due to
headers and CRC trailer.
@dfawley
Copy link
Member

dfawley commented Dec 4, 2023

Can you explain the significance of this issue for you? Whether you do some epsilon additional work to produce a small amount more data, empty messages would still be extremely small and fast to transmit. This feels like a micro-optimization to me.

Note that a custom compressor could also implement similar behavior internally.

@jroper
Copy link
Contributor Author

jroper commented Dec 4, 2023

If this optimisation took 1000 lines of code to implement, then I would agree, it's a micro-optimisation that isn't worth the code to implement it. But that's not what this is. It's a 2 line branch that reduces the size of empty frames by over 80%.

The Java implementation doesn't compress empty messages, this behaviour isn't without precedent. If you'd like it to be simpler, I could change it to match the behaviour of the Java implementation and not decompress empty messages, rather than having a length limit of 21.

@dfawley
Copy link
Member

dfawley commented Dec 7, 2023

Even optimizing empty messages by 100% seems unlikely to have any measurable impact in any real system.

gzip isn't the only compressor we use, so "21" is a very magical number which might be worse in some cases, so I'd definitely rather not hard code that.

I think if we do this, I'd prefer to have an optional magic error (encoding.ErrDoNotCompress) that can be returned by a compressor to indicate that gRPC should send the message uncompressed. This could also allow for more dynamic behavior depending on the message contents or the failure to reduce the size (if len(compressed) >= len(original) { return nil, encoding.ErrDoNotCompress }).

@jroper
Copy link
Contributor Author

jroper commented Dec 8, 2023

That sounds reasonable to me. I'm happy to implement that if you're willing to accept it.

@dfawley dfawley added Type: Feature New features or improvements in behavior and removed Status: Requires Reporter Clarification Type: Bug labels Dec 8, 2023
@dfawley
Copy link
Member

dfawley commented Dec 8, 2023

SGTM, thanks!

@jroper
Copy link
Contributor Author

jroper commented Dec 8, 2023

One thing that just occurs to me, is possibly returning encoding.ErrDoNotCompress a breaking change that we care about? Are there clients to the Compressor interface outside of grpc-go that we don't want to break? For example, in my own codebase I've implemented grpc-web client support in go (for the purpose of traversing transparent, decrypting http/1.1 proxies that some of our customers in corporate environments use), and I've reused parts of the grpc-go API, including the encoding package, to implement that, so this change may break my code. Not that that's a problem for me, I can simply update my code to support it. But I just want to make sure this breaking change is being made consciously. It perhaps would warrant calling it out in the release notes.

@dfawley
Copy link
Member

dfawley commented Dec 8, 2023

One thing that just occurs to me, is possibly returning encoding.ErrDoNotCompress a breaking change that we care about?

I would say "no", since the dependency we care about is gRPC consuming this and users implementing it, not the other way around. But calling it out in the release notes SGTM.

@jroper
Copy link
Contributor Author

jroper commented Dec 8, 2023

Sounds fine to me.

@jroper
Copy link
Contributor Author

jroper commented Dec 8, 2023

There's a problem with this approach. The Compressor only wraps a Writer. When Compressor.Compress is invoked, it has no idea what the size of the message being compressed is. So we can't return the error from there. The only place we can return the error is from the Close method of the returned CloseWriter, because that's when the compressor will know the length of the message, and/or whether compression has been productive. But, by that time, it's already written out to the returned writer. In practice, this is ok because when we invoke it, we supply a buffer, so nothing has actually been written out yet (and we have to supply a buffer because we need to know the length of the returned buffer). But it still feels very wrong to emit a control message saying don't compress after writing out to the buffer.

So, I don't think this can be implemented inside the Compressor interface without breaking the Compressor interface (eg, to pass around []byte rather than io.Writer). I think simply not compressing empty messages is a reasonable trade off, ie, remove the magic 21. Then we don't need to have any knowledge of the compressor, other than the knowledge that no compressor to compress 0 bytes to be smaller than 0 bytes.

@dfawley
Copy link
Member

dfawley commented Dec 9, 2023

it still feels very wrong to emit a control message saying don't compress after writing out to the buffer.

Yeah..I think you could take advantage of the fact that we know Write is always called exactly once by gRPC and avoid that, but maybe we don't want to always call it that way, so it would probably be a bad idea to do that.

I'm fine with the 0-size skipping in that case. Or if you wanted to optimize even further, add a second optional method to Compressor called MinSizeToCompress() int (e.g.) that returns a number to compare against instead of hard-coding something specific.

jroper added a commit to jroper/grpc-go that referenced this issue Dec 9, 2023
Fixes grpc#6831.

This avoids compressing messages that are empty, since you can't compress zero
bytes to anything smaller than zero bytes, and most compression algorithms
output headers and trailers which means the resulting message will be non-zero
bytes.
jroper added a commit to jroper/grpc-go that referenced this issue Dec 9, 2023
Fixes grpc#6831.

This avoids compressing messages that are empty, since you can't compress zero
bytes to anything smaller than zero bytes, and most compression algorithms
output headers and trailers which means the resulting message will be non-zero
bytes.
dfawley pushed a commit that referenced this issue Jan 9, 2024
Fixes #6831.

This avoids compressing messages that are empty, since you can't compress zero
bytes to anything smaller than zero bytes, and most compression algorithms
output headers and trailers which means the resulting message will be non-zero
bytes.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Feature New features or improvements in behavior
Projects
None yet
2 participants