Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed UnpackerConfig.bufferSize to work #658

Closed
wants to merge 2 commits into from

Conversation

napo0703
Copy link

This will Resolve #657

It limits internal buffer size when unpacking Array and Map.
However, I think the default bufferSize of 8192 is too small.
I changed the default buffer size to 100MiB following msgpack-python.

@@ -518,7 +518,7 @@ public boolean isStr8FormatSupport()

private int stringSizeLimit = Integer.MAX_VALUE;

private int bufferSize = 8192;
private int bufferSize = 100 * 1024 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change the default buffer size. msgpack-java is frequently used even for the small size of data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially, creating a large buffer is not free as Java will use CPU resources for filling zeros to the array contents.

* @throws IOException when underlying input throws IOException
*/
public byte[] readPayload(int length)
throws IOException
{
if (length > bufferSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a good catch. If that happens, instead of throwing an error, we need to enlarge the buffer size by creating a new buffer large enough for storing the expected data size.

This implementation would be similar to ArrayList implementation of Java.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry that I don't quite understand here.
Am I correct that the purpose of UnpackerConfig.bufferSize is to limit the size of the new byte array in readPayload(int)?
Do I need to change this code?

Copy link
Member

@xerial xerial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do not change the default buffer size from 8K
  • Setting the default buffer size from config is ok
  • Do not throw an exception in unpackXXXheader. Only readPayload needs to be fixed to support large volumes of data
  • readPayload should swap the existing buffer to a large buffer if the payload length exceeds the current bufferSize

Comment on lines 1336 to 1360
int len;
if (Code.isFixedArray(b)) { // fixarray
return b & 0x0f;
len = b & 0x0f;
}
switch (b) {
case Code.ARRAY16: { // array 16
int len = readNextLength16();
return len;
}
case Code.ARRAY32: { // array 32
int len = readNextLength32();
return len;
else {
switch (b) {
case Code.ARRAY16: { // array 16
len = readNextLength16();
break;
}
case Code.ARRAY32: { // array 32
len = readNextLength32();
break;
}
default: {
throw unexpected("Array", b);
}
}
}
throw unexpected("Array", b);

if (len > bufferSize) {
throw new MessageSizeException(String.format("cannot unpack a Array of size larger than %,d: %,d", bufferSize, len), len);
}

return len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing error here is unnecessary if readPayload checks the buffer size.

Comment on lines 1380 to 1404
int len;
if (Code.isFixedMap(b)) { // fixmap
return b & 0x0f;
}
switch (b) {
case Code.MAP16: { // map 16
int len = readNextLength16();
return len;
}
case Code.MAP32: { // map 32
int len = readNextLength32();
return len;
else {
switch (b) {
case Code.MAP16: { // map 16
len = readNextLength16();
break;
}
case Code.MAP32: { // map 32
len = readNextLength32();
break;
}
default: {
throw unexpected("Map", b);
}
}
}
throw unexpected("Map", b);

if (len > bufferSize / 2) {
throw new MessageSizeException(String.format("cannot unpack a Map of size larger than %,d: %,d", bufferSize / 2, len), len);
}

return len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change for the same reason

@napo0703
Copy link
Author

Buffer size limits now work, but some tests no longer pass.
Should I change the Unpacker buffer size used for this test in the settings?

-        val unpacker = MessagePack.newDefaultUnpacker(new InputStreamBufferInput(new ByteArrayInputStream(out.toByteArray)))
+        val unpacker = new MessagePack.UnpackerConfig().withBufferSize(16 * 1024).newUnpacker(new InputStreamBufferInput(new ByteArrayInputStream(out.toByteArray)))

@xerial
Copy link
Member

xerial commented Sep 24, 2023

Upon re-reading the PR, I noticed some misunderstandings.

UnpackerConfig.bufferSize is used to control the internal buffer size, but it is not meant to limit the size of the uncompressed data. In order to prevent loading excessively large data into memory, we need to introduce another configuration.

I will close this pull request because it has become outdated. The requirement to limit the uncompressed data size and fail the Unpacker is still valid, but it should be addressed in a different context.

@xerial xerial closed this Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consuming a large amount of memory when unpacking huge Array and Map unexpectedly
2 participants