Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]Add zstd decoder #10422

Closed
wants to merge 22 commits into from
Closed

[Feature]Add zstd decoder #10422

wants to merge 22 commits into from

Conversation

skyguard1
Copy link
Contributor

@skyguard1 skyguard1 commented Jul 22, 2020

Motivation:

Zstandard(https://facebook.github.io/zstd/) is a high performance, high compression ratio compression algorithm,This pr is to add netty support for the zstandard algorithm,The implementation of zstandard algorithm relies on zstd-jni (https://github.com/luben/zstd-jni), which is an openSource third-party library,Apache Kafka is also using this library for message compression processing.Please review this pr,thk

Modification:

Add ZstdDecoder and test case.

Result:

netty supports ZSTD with zstdDecoder

@netty-bot
Copy link

Can one of the admins verify this patch?

3 similar comments
@netty-bot
Copy link

Can one of the admins verify this patch?

@netty-bot
Copy link

Can one of the admins verify this patch?

@netty-bot
Copy link

Can one of the admins verify this patch?

codec/pom.xml Outdated Show resolved Hide resolved
@hyperxpro
Copy link
Contributor

I was going to add Brotli support but got stuck with some projects. If you got some time, would you mind adding Brotli support too, please?

See #6899

Copy link
Contributor

@hyperxpro hyperxpro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! 👍 Just minor code style fixes.

@@ -0,0 +1,33 @@
package io.netty.handler.codec.compression;

public class ZstdConstants {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be final


private boolean validateCheckSum;

public ZstdDecoder() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra lines from 30,32,34,36,38,40.

return close(ctx().newPromise());
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, remove all double and extra lines between variables.

return start + random.nextInt(end - start + 1);
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End with a new line

Random random = new Random();
return start + random.nextInt(end - start + 1);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, remove all double and extra lines between variables.

return sb.toString();
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End with a new line

return sb.toString();
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End with a new line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finished

@skyguard1
Copy link
Contributor Author

skyguard1 commented Jul 23, 2020

I was going to add Brotli support but got stuck with some projects. If you got some time, would you mind adding Brotli support too, please?

See #6899

Thanks for your review, but I don't know much about this compression algorithm.sorry.

Copy link
Contributor

@hyperxpro hyperxpro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add license headers to all files.

@skyguard1
Copy link
Contributor Author

skyguard1 commented Jul 23, 2020

Add license headers to all files.

I've finished.appreciate it.

@skyguard1
Copy link
Contributor Author

skyguard1 commented Jul 30, 2020

Is there any update? 😃

@hyperxpro
Copy link
Contributor

\cc @normanmaurer

@normanmaurer
Copy link
Member

I will have a look next week.

@normanmaurer
Copy link
Member

@chrisvest PTAL when you have a chance

Copy link
Contributor

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some initial comments. I'll review more thoroughly tomorrow.

@skyguard1
Copy link
Contributor Author

I had some initial comments. I'll review more thoroughly tomorrow.

Thanks for your review

@normanmaurer
Copy link
Member

@idelpivnitskiy you had any time yet to review this one ?

Copy link
Member

@idelpivnitskiy idelpivnitskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @skyguard1,

Thank you for this huge effort in adding a new compression codec! This is a complex task.
I have comments for the existing implementation, but before addressing them let's discuss the format of Zstd blocks first, because it may impact the current implementation. Please, see my last comment in this review.


public ZstdDecoder(boolean validateCheckSum) {
this.checksum = ByteBufChecksum.wrapChecksum(new CRC32());
this.validateCheckSum = validateCheckSum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider not allocating a ByteBufChecksum object if validation is not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you don't need this boolean, use checksum != null as a check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it

super(true);
compressionLevel = compressionLevel(blockSize);
this.blockSize = blockSize;
this.checksum = ByteBufChecksum.wrapChecksum(new CRC32());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Zstd documentation [1] I see that they use XXH64 algorithm for checksum:

generate a 32-bits checksum using XXH64 algorithm at end of frame, for error detection

We should use the same in netty by default and provide an option to pass a custom Checksum instance if necessary, similar to Lz4FrameEncoder.

We already have xxhash dependency in netty and can create something similar to Lz4XXHash32 for XX64, with little rework and code sharing.

[1] http://facebook.github.io/zstd/zstd_manual.html

Copy link
Contributor Author

@skyguard1 skyguard1 Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try, does the zstd hash here need to be different from Lz4XXHash32? I don’t know much about this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lz4XXHash32 uses XXH32 hash defined like a static constant:

private static final XXHash32 XXHASH32 = XXHashFactory.fastestInstance().hash32();

You need a similar class, but for XXHashFactory.fastestInstance().hash64(). You can share most of the code by having an abstract class that takes a hash function in a protected constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it

out.setIntLE(idx + COMPRESSED_LENGTH_OFFSET, compressedLength);
out.setIntLE(idx + DECOMPRESSED_LENGTH_OFFSET, flushableBytes);
out.setIntLE(idx + DEFAULT_CHECKSUM_OFFSET, check);
out.writerIndex(idx + HEADER_LENGTH + compressedLength);
Copy link
Member

@idelpivnitskiy idelpivnitskiy Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a specific custom protocol, not an implementation that follows a certain framework

Thanks for confirming. I notice that this PR follows the format similar to Lz4FrameEncoder, but no other library follows the same format for Zstd.

The reason why we use this format for LZ4 codec is because lz4-java [1] was the only one java implementation of LZ4 back in 2014. Therefore, we decided to follow the same format as LZ4BlockOutputStream [2] to make netty implementation compatible with any other java system that may use the same library. We verify this compatibility in Lz4FrameEncoderTest and Lz4FrameDecoderTest.

Time flies and I see that the official LZ4 website [3] now recommends Apache Commons Compress as an interoperable java LZ4 port. Apache added support for LZ4 in 2017 [4].

Zstd has a broader scope and applicability compare to LZ4. It's an official standard [5] that defines the format of compressed frames and the frame header. The standard also defines application/zstd HTTP media-type and zstd content-encoding. The Zstd algorithm is a lot more used in the world for different use-cases, including HTTP.

Taking that into account, I would like to discuss with you the motivation for adding Zstd codec with a custom header that is not supported by any other non-netty system. Could you please describe your use-case and how this codec is intended to be used?

It will be highly appreciated if we can follow the official standard and implement a codec with the interoperable Zstd format. It will allow us to reuse your work in the HTTP codec and communicate with any other systems that use Zstd. The impact of contributing an interoperable version will be huge. And we will be able to use zstd-jni [6] Zstd[Input|Output]Stream to verify the compatibility.

[1] https://github.com/lz4/lz4-java
[2] https://github.com/lz4/lz4-java/blob/master/src/java/net/jpountz/lz4/LZ4BlockOutputStream.java
[3] https://lz4.github.io/lz4/
[4] https://github.com/apache/commons-compress/commits/master/src/main/java/org/apache/commons/compress/compressors/lz4/FramedLZ4CompressorInputStream.java
[5] https://tools.ietf.org/html/rfc8478
[6] https://github.com/luben/zstd-jni

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion, so do you mean that the headers needs to be removed, or other changes are needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting to implement a header format defined by the Zstd RFC [1] instead of defining a custom header that can not be understood by any other non-netty system. We should have a strong reason to implement a custom header instead of following the RFC. If you have this reason, please provide more context about why you chose the current header format.

zstd-jni will be very helpful to do most of the work. In netty, look at how ZlibEncoder and ZlibDecoder are implemented. The gzip/deflate have a similar format to Zstd.

[1] https://tools.ietf.org/html/rfc8478

Copy link
Member

@idelpivnitskiy idelpivnitskiy Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main point here that whatever ZstdEncoder produces, other systems should be able to decompress and vice versa. For the test we can use ZstdInputStream and ZstdOutputStream classes from zstd-jni. See Bzip2EncoderTest and Bzip2DecoderTest as an examples of how to verify output of the encoder/decoder with Input/Output streams.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you address other comments, let's finish this discussion and come to some agreement. Because if we decide to follow the standard it will require significant changes of the current encoder and decoder implementation and my other comments won't be relevant anymore.

// cc @normanmaurer @trustin wdyt about having a non-standard header for Zstd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skyguard1 as @idelpivnitskiy mentioned we need to follow the RFC and implement the headers the way its defined there. Otherwise it will be impossible to interopt with other implementations out in the wild

@skyguard1
Copy link
Contributor Author

skyguard1 commented Aug 14, 2020

Hi @skyguard1,

Thank you for this huge effort in adding a new compression codec! This is a complex task.
I have comments for the existing implementation, but before addressing them let's discuss the format of Zstd blocks first, because it may impact the current implementation. Please, see my last comment in this review.

Thanks for your great effort in code review, I will take a look at it and make corresponding changes.appreciate it.


public ZstdDecoder(boolean validateCheckSum) {
this.checksum = ByteBufChecksum.wrapChecksum(new CRC32());
this.validateCheckSum = validateCheckSum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you don't need this boolean, use checksum != null as a check

// by updating currentState, reset all numbers and checksum.
compressedLength = 0;
decompressedLength = 0;
currentChecksum = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, also clear the blockType and reset checksum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it

* See <a href="https://facebook.github.io/zstd">Zstandard</a>.
* Please note that checksum validate is enabled by default if you use the default constructor
* if you want to disabled checksum validate,please use {@link ZstdDecoder(boolean)} constructor
* set to {@code false}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this note if constructors have their own javadoc, describing this behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it

private boolean validateCheckSum;

/**
* Same as {@link #ZstdDecoder(boolean)} with validateCheckSum = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (similar to Lz4FrameDecoder):

    /**
     * Creates a new Zstd decoder.
     *
     * Note that by default, validation of the checksum header in each chunk is
     * DISABLED for performance improvements. If performance is less of an issue,
     * or if you would prefer the safety that checksum validation brings, please
     * use the {@link #ZstdDecoder(boolean)} constructor with the argument
     * set to {@code true}.
     */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it

/**
* Construct a new ZstdDecoder.
* @param validateCheckSum
* Whether to enable checksum verification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

    /**
     * Creates a new Zstd decoder.
     *
     * @param validateCheckSum  if {@code true}, the checksum field will be validated against the actual
     *                          uncompressed data, and if the checksums do not match, a suitable
     *                          {@link DecompressionException} will be thrown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it

footer.setByte(idx + TOKEN_OFFSET, (byte) (BLOCK_TYPE_NON_COMPRESSED | compressionLevel));
footer.setInt(idx + COMPRESSED_LENGTH_OFFSET, 0);
footer.setInt(idx + DECOMPRESSED_LENGTH_OFFSET, 0);
footer.setInt(idx + DEFAULT_CHECKSUM_OFFSET, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 params (buffer, idx, blockType, compressedLength, decompressedLength, checksum) sounds ok for the internal private method.

codec/pom.xml Outdated
@@ -30,6 +30,7 @@

<properties>
<javaModuleName>io.netty.codec</javaModuleName>
<ztsd-version>1.4.5-6</ztsd-version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This property can be removed now because the version is managed by parent pom file.


import static io.netty.handler.codec.compression.ZstdConstants.*;

public class ZstdEncoderTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, port required use-cases to ZstdIntegrationTest, a separate test class for ZstdEncoder is not necessary

super(true);
compressionLevel = compressionLevel(blockSize);
this.blockSize = blockSize;
this.checksum = ByteBufChecksum.wrapChecksum(new CRC32());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lz4XXHash32 uses XXH32 hash defined like a static constant:

private static final XXHash32 XXHASH32 = XXHashFactory.fastestInstance().hash32();

You need a similar class, but for XXHashFactory.fastestInstance().hash64(). You can share most of the code by having an abstract class that takes a hash function in a protected constructor.

@skyguard1
Copy link
Contributor Author

skyguard1 commented Aug 17, 2020

I made some changes, the biggest problem is to implement the standard protocol format, which will have huge changes, anyone can make their own suggestion

@normanmaurer
Copy link
Member

@skyguard1 please address comments of @idelpivnitskiy and ping me once ready for review again

@skyguard1
Copy link
Contributor Author

@normanmaurer,Sorry, I am trying to implement the rfc8478 specification, it is really a bit complicated, I will tell you when I finish, thank

@chrisvest
Copy link
Contributor

@skyguard1 What's the source of the complexity? I would've though the zstd library would provide most of what's needed?

@skyguard1
Copy link
Contributor Author

@chrisvest,I need to implement the rfc8478 specification, which has a huge modification to the original logic, and I have to do more tests to ensure that it can be universal

@idelpivnitskiy
Copy link
Member

idelpivnitskiy commented Aug 27, 2020

@skyguard1 I would be happy to help if you have any questions or need guidance on the approach. Feel free to ask in this thread.

I would propose to implement it in stages. It's fine to send multiple PRs. For example, you can split the work in this way:

  1. ZstdEncoder - it should be relatively simple, because you can just use the Zstd library to compress data. The only thing you need to do there is to figure out the correct size of the output ByteBuf to minimize redundant allocations.
  2. ZstdDecoder - this is an interesting part. You do not need to implement the full rfc8478 specification, the Zstd library should be used for the actual decompression. The only important bit there is a header format. Most likely (if Zstd library does not help with that*) you need to parse the header to understand which blocks are compressed and which are not + the decompressed size of the block. This info is helpful to allocate necessary ByteBufs.

* zstd format is similar to gzip. Take a look at existing ZlibDecoder and its two implementations: JZlibDecoder & JdkZlibDecoder:

  • JZlibDecoder uses com.jcraft.jzlib library that does not require netty to parse the header, it does all the work.
  • JdkZlibDecoder uses java.util.zip that is less flexible and required netty to parse the gzip header.

Try to dig into the zstd library to understand its capabilities better and decide on the decoder approach.

Testing: do not spend time on excessive testing during the development. We can decided during the review if additional testing is necessary. As a start, use existing AbstractEncoderTest and AbstractDecoderTest classes. They contain all required tests to check compatibility between netty's codec and the original implementation from the 3-party library. You can use these classes to verify that the output is compatible with ZstdInputStream & ZstdOutputStream.

@skyguard1
Copy link
Contributor Author

@idelpivnitskiy,I will look at the implementation of zstd in more detail, thanks for your suggestions, appreciate it

@hyperxpro
Copy link
Contributor

@skyguard1 What's the status?

@DinoCassowary
Copy link

@normanmaurer @skyguard1 do we have any updates on this PR? It looks like the comments have been addressed, but now 6 months later there are some merge conflicts in the pom.xml. I'm very interested in using this integration of ZSTD.

@skyguard1
Copy link
Contributor Author

@DinoCassowary The biggest problem now is that network packets are transmitted in segments, so it is necessary to judge the size of the data block when decoding. I have submitted an issue in zstd-jni, hoping that zstd-jni can add the ability to judge the boundary of data blocks feature, the author of zstd-jni added this, and reverted it. The reason given by the author of zstd-jni is that the data stream has no boundaries, but zstd supports reading the data block size. The author of zstd-jni suggested adding the data block size at the application protocol, which conflicts with the general decode function.The author of zstd-jni refused to add the ability to determine the size of the data block
See luben/zstd-jni#165
Do you have any suggestion?Thanks

raidyue pushed a commit to raidyue/netty that referenced this pull request Jul 8, 2022
Motivation:

As discussed in netty#10422, ZstdEncoder can be added separately

Modification:

Add ZstdEncoder separately

Result:

netty supports ZSTD with ZstdEncoder

Signed-off-by: xingrufei <xingrufei@sogou-inc.com>
Co-authored-by: xingrufei <xingrufei@sogou-inc.com>
@BrokenWingsIcarus
Copy link

@skyguard1
see this impl aircompressor
Can it provide what you want?

@skyguard1
Copy link
Contributor Author

@skyguard1 see this impl aircompressor Can it provide what you want?

This doesn't solve the issue

@skyguard1 skyguard1 mentioned this pull request Aug 6, 2023
Signed-off-by: xingrufei <qhdxssm@qq.com>
Signed-off-by: xingrufei <qhdxssm@qq.com>
Signed-off-by: xingrufei <qhdxssm@qq.com>
Signed-off-by: xingrufei <qhdxssm@qq.com>
@skyguard1 skyguard1 closed this Mar 16, 2024
normanmaurer added a commit that referenced this pull request Mar 18, 2024
Motivation:

Zstandard(https://facebook.github.io/zstd/) is a high performance, high
compression ratio compression algorithm,This pr is to add netty support
for the zstandard algorithm,The implementation of zstandard algorithm
relies on zstd-jni (https://github.com/luben/zstd-jni), which is an
openSource third-party library,Apache Kafka is also using this library
for message compression processing.
This is the copy of #10422

Modification:

Add ZstdDecoder and test case.

Result:

netty supports ZSTD with zstdDecoder

---------

Signed-off-by: xingrufei <qhdxssm@qq.com>
Co-authored-by: xingrufei <xingrufei@yl.com>
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Co-authored-by: Chris Vest <mr.chrisvest@gmail.com>
normanmaurer added a commit that referenced this pull request Mar 18, 2024
Motivation:

Zstandard(https://facebook.github.io/zstd/) is a high performance, high
compression ratio compression algorithm,This pr is to add netty support
for the zstandard algorithm,The implementation of zstandard algorithm
relies on zstd-jni (https://github.com/luben/zstd-jni), which is an
openSource third-party library,Apache Kafka is also using this library
for message compression processing.
This is the copy of #10422

Modification:

Add ZstdDecoder and test case.

Result:

netty supports ZSTD with zstdDecoder

---------

Signed-off-by: xingrufei <qhdxssm@qq.com>
Co-authored-by: xingrufei <xingrufei@yl.com>
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Co-authored-by: Chris Vest <mr.chrisvest@gmail.com>
normanmaurer added a commit that referenced this pull request Mar 18, 2024
Motivation:

Zstandard(https://facebook.github.io/zstd/) is a high performance, high
compression ratio compression algorithm,This pr is to add netty support
for the zstandard algorithm,The implementation of zstandard algorithm
relies on zstd-jni (https://github.com/luben/zstd-jni), which is an
openSource third-party library,Apache Kafka is also using this library
for message compression processing.
This is the copy of #10422

Modification:

Add ZstdDecoder and test case.

Result:

netty supports ZSTD with zstdDecoder

Signed-off-by: xingrufei <qhdxssm@qq.com>
Co-authored-by: skyguard1 <qhdxssm@qq.com>
Co-authored-by: xingrufei <xingrufei@yl.com>
Co-authored-by: Chris Vest <mr.chrisvest@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants