Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the max length of decoded content in Decoding{Service,Client} #4564

Merged
merged 6 commits into from
Dec 20, 2022

Conversation

ikhoon
Copy link
Contributor

@ikhoon ikhoon commented Dec 7, 2022

Motivation:

ServerBuilder.maxRequestLength() and
ClientBuilder.maxResponseLength() can limit the max content at the session layer.

* Sets the maximum allowed length of the content decoded at the session layer.
* e.g. the content length of an HTTP request.
*
* @param maxRequestLength the maximum allowed length. {@code 0} disables the length limit.
*/
public ServerBuilder maxRequestLength(long maxRequestLength) {
The length-limited compressed content could be a humongous object if the original content was compressed with a high ratio. Such big objects may cause OutOfMemoryError even if the content length is limited at the session layer. For that reason, it is necessary to check the length of an output content while decompressing the compressed content.

Additionally, this PR also fixes #4469.

Note:

ZlibDecoder is able to limit the size at the decoder level. Unfortunately, there is no such option BrotliDecoder, but output data is chunked into 4MiB if it exceeds 4MiB. We can't sophisticatedly enforce the size of output but 4MiB seems an acceptable size.

Modifications:

  • Added a new API that takes the max length of a decompressed object to StreamDecoderFatory.
  • Fixed StreamDecoder to raise ContentTooLargeException if the total length of an object exceeds a certain limitation.
    • Fixed ZlibStreamDecoder to hand over the given maxLength to ZlibDecoder.
    • Fixed AbstractStreamDecoder to limit the content length after decoding.
  • Added com.aayushatharva.brotli4j:native-osx-aarch64 as an optional dependency.
  • Removed obsolete client.encoding.StreamDecoderFactories and client.encoding.ZlibStreamDecoder which can be replaced with the same classes in common.encoding.
  • Made ContentTooLargeExeption take a cause which is most likely a DecompressionException raised by ZlibDecoder.
  • Fixed DefaultServerErrorHandler to return 413 Request Entity Too Large for ContentTooLargeExeption.
  • Replaced ByteArrayOutputStream with ByteBufOutputStream in HttpEncodedResponse for zero-copy.
  • A Content-Encoding is removed by DecodingClient and DecodingService when compressed content is to be decompressed.

Result:

Motivation:

`ServerBuilder.maxRequestLength()` and
`ClientBuilder.maxResponseLength()` can limit the max content at the
session layer.
https://github.com/line/armeria/blob/ec7c4251994f4b3b0089f95904dc1a68e5e9ce30/core/src/main/java/com/linecorp/armeria/server/ServerBuilder.java#L1657-L1662
The length-limited compressed content could be a humongous object if the
original content was compressed with a high ratio. Such big objects may
cause `OutOfMemoryError` even if the content length is limited at the
session layer. For that reason, it is necessary to check the length of
an output content while decompressing the compressed content.

Additionally, this PR also fixes line#4469.

Note:

`ZlibDecoder` is able to limit the size at the decoder level.
Unfortunately, there is no such option `BrotliDecoder`, but an output
data is chunked into 4MiB if it exceeds 4MiB. We can't sophicately
enforce the size of output but 4MiB seems a acceptable size.

Modifications:

- Added a new API that takes the max length of a decompressed object to
  `StreamDecoderFatory`.
- Fixed `StreamDecoder` to raise `ContentTooLargeException` if the total
  length of an object exceeds a certain limitation.
  - Fixed `ZlibStreamDecoder` to hand over the given `maxLength` to
    `ZlibDecoder`.
  - Fixed `AbstractStreamDecoder` to limit the content length after
    decoding.
- Added `com.aayushatharva.brotli4j:native-osx-aarch64` as an optional
  dependency.
- Removed obsolete `client.encoding.StreamDecoderFactories` and
  `client.encoding.ZlibStreamDecoder` which can be replaced with the
  same classes in `common.encoding`.
- Made `ContentTooLargeExeption` take an cause which is most likely a
  `DecompressionException` raised by `ZlibDecoder`.
- Fixed `DefaultServerErrorHandler` to return `413 Request Entity Too
  Large` for `ContentTooLargeExeption`.
- Replaced `ByteArrayOutputStream` with `ByteBufOutputStream` in
 `HttpEncodedResponse` for zero-copy.
- A `Content-Encoding` is removed by `DecodingClient` and
  `DecodingService` when a compressed content is to be decompressed.

Result:

- `DecodingClient` and `DecodingService` can now limit the maximum length
  of decompressed data.
- Fixes line#4469
@ikhoon ikhoon added this to the 1.21.0 milestone Dec 7, 2022
Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left some small suggestions. 😉

decoder.writeInbound(obj.byteBuf());
} catch (DecompressionException ex) {
final String message = ex.getMessage();
if (message != null && message.startsWith("Decompression buffer has reached maximum size:")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small suggestion. 😄
How about filing an issue to the upstream for introducing a specific exception type for this so that we don't rely on the message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

decoded.release();
}
newBuf.release();
throw ContentTooLargeException.builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check the call path of this method whether we throw this exception where we shouldn't?
It seems like AbstractStreamDecoder.finish() is called in onComplete and the exception isn't caught.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice point. We need to call delegate.onError() if beforeError() raises an exception.
Let me check possible cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to handle ContentTooLargeException in:

  • beforeComplete(): Halt onComplete() and call onError() instead
  • beforeError(): Create a CompositeException and set both the original exception and ContentTooLargeException
  • onCancellation(): Since the stream was canceled, just warn the error message.

if (noJdkZlibDecoder) {
return new JZlibDecoder(wrapper, maxLength);
} else {
return new JdkZlibDecoder(wrapper, true, maxLength);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the true mean? It seems like the default value is false in JdkZkipDecoder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original API we used ZlibCodecFactory.newZlibDecoder() internally set decompressConcatenated to true.
https://github.com/netty/netty/blob/4.1/codec/src/main/java/io/netty/handler/codec/compression/ZlibCodecFactory.java#L124

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netty/netty@6ff48dc
It seems like the Gzip stream could be concatenation of multiple streams.

@@ -92,7 +101,7 @@ protected HttpObject filter(HttpObject obj) {
return obj;
}

encodedStream = new ByteArrayOutputStream();
encodedStream = new ByteBufOutputStream(alloc.buffer());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about specifying the initial capacity using the value from content-length header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, most text data will be shrunk to 5~8 times. So I'm not sure what ratio is a sensible value. Let me set content-length / 3 as the initial value.

Copy link
Contributor Author

@ikhoon ikhoon Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the initial value to set content-length / 2. A compression ratio would be higher than 50%.

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good 👍 Left a minor question

@@ -41,6 +41,11 @@ final class HttpEncoders {

private static final Encoder.Parameters BROTLI_PARAMETERS = new Encoder.Parameters().setQuality(4);

static {
// Invoke to load Brotli native binary.
Brotli.isAvailable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) Why is it necessary to pre-load this class here?

Copy link
Contributor Author

@ikhoon ikhoon Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a defect of brotli4j that a Brotli native library is not initialized when BrotliOutputStream is created.
It is not a problem because Brotli.isAvailable() is called before brotli codec is applied in the request call path.

However, when I tried to test brotli codec with HttpEncoders, the test failed because Brotli native library wasn't loaded. I added this for the sake of the convenience of testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, when I tried to test brotli codec with HttpEncoders, the test failed because Brotli native library wasn't loaded. I added this for the sake of the convenience of testing.

That's interesting. Which test failed without this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the brotli initialization error in HttpEncodedResponseTest.shouldReleaseEncodedStreamOnError() after removing Brotli.isAvailable().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha, I get it. determineEncoding which calls Brotli.isAvailable() is called ahead so it was no big deal when the code is used in production. 😉

Copy link
Member

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ikhoon!

if (noJdkZlibDecoder) {
return new JZlibDecoder(wrapper, maxLength);
} else {
return new JdkZlibDecoder(wrapper, true, maxLength);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netty/netty@6ff48dc
It seems like the Gzip stream could be concatenation of multiple streams.

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @ikhoon ! 🙇 👍 🙇

@minwoox minwoox merged commit dd49448 into line:master Dec 20, 2022
@minwoox
Copy link
Member

minwoox commented Dec 20, 2022

🎉 🎉 🎉

@ikhoon ikhoon deleted the decoding-limit branch May 25, 2023 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider removing content-encoding header in DecodingService and DecodingClient
3 participants