-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]Add zstd decoder #10422
[Feature]Add zstd decoder #10422
Conversation
Can one of the admins verify this patch? |
3 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
I was going to add Brotli support but got stuck with some projects. If you got some time, would you mind adding Brotli support too, please? See #6899 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! 👍 Just minor code style fixes.
codec/src/main/java/io/netty/handler/codec/compression/ZstdConstants.java
Show resolved
Hide resolved
@@ -0,0 +1,33 @@ | |||
package io.netty.handler.codec.compression; | |||
|
|||
public class ZstdConstants { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be final
|
||
private boolean validateCheckSum; | ||
|
||
public ZstdDecoder() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra lines from 30,32,34,36,38,40.
return close(ctx().newPromise()); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, remove all double and extra lines between variables.
return start + random.nextInt(end - start + 1); | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End with a new line
Random random = new Random(); | ||
return start + random.nextInt(end - start + 1); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, remove all double and extra lines between variables.
return sb.toString(); | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End with a new line
return sb.toString(); | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End with a new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've finished
Thanks for your review, but I don't know much about this compression algorithm.sorry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add license headers to all files.
I've finished.appreciate it. |
Is there any update? 😃 |
\cc @normanmaurer |
I will have a look next week. |
@chrisvest PTAL when you have a chance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some initial comments. I'll review more thoroughly tomorrow.
codec/src/main/java/io/netty/handler/codec/compression/ZstdDecoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdDecoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdDecoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdEncoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdEncoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdEncoder.java
Outdated
Show resolved
Hide resolved
Thanks for your review |
@idelpivnitskiy you had any time yet to review this one ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @skyguard1,
Thank you for this huge effort in adding a new compression codec! This is a complex task.
I have comments for the existing implementation, but before addressing them let's discuss the format of Zstd blocks first, because it may impact the current implementation. Please, see my last comment in this review.
|
||
public ZstdDecoder(boolean validateCheckSum) { | ||
this.checksum = ByteBufChecksum.wrapChecksum(new CRC32()); | ||
this.validateCheckSum = validateCheckSum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider not allocating a ByteBufChecksum
object if validation is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you don't need this boolean, use checksum != null
as a check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it
codec/src/main/java/io/netty/handler/codec/compression/ZstdDecoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdDecoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdDecoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdDecoder.java
Outdated
Show resolved
Hide resolved
codec/src/test/java/io/netty/handler/codec/compression/ZstdDecoderTest.java
Outdated
Show resolved
Hide resolved
super(true); | ||
compressionLevel = compressionLevel(blockSize); | ||
this.blockSize = blockSize; | ||
this.checksum = ByteBufChecksum.wrapChecksum(new CRC32()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Zstd documentation [1] I see that they use XXH64 algorithm for checksum:
generate a 32-bits checksum using XXH64 algorithm at end of frame, for error detection
We should use the same in netty by default and provide an option to pass a custom Checksum
instance if necessary, similar to Lz4FrameEncoder
.
We already have xxhash
dependency in netty and can create something similar to Lz4XXHash32
for XX64, with little rework and code sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try, does the zstd hash here need to be different from Lz4XXHash32? I don’t know much about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lz4XXHash32
uses XXH32
hash defined like a static constant:
private static final XXHash32 XXHASH32 = XXHashFactory.fastestInstance().hash32();
You need a similar class, but for XXHashFactory.fastestInstance().hash64()
. You can share most of the code by having an abstract class that takes a hash function in a protected
constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it
codec/src/main/java/io/netty/handler/codec/compression/ZstdEncoder.java
Outdated
Show resolved
Hide resolved
codec/src/main/java/io/netty/handler/codec/compression/ZstdDecoder.java
Outdated
Show resolved
Hide resolved
out.setIntLE(idx + COMPRESSED_LENGTH_OFFSET, compressedLength); | ||
out.setIntLE(idx + DECOMPRESSED_LENGTH_OFFSET, flushableBytes); | ||
out.setIntLE(idx + DEFAULT_CHECKSUM_OFFSET, check); | ||
out.writerIndex(idx + HEADER_LENGTH + compressedLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a specific custom protocol, not an implementation that follows a certain framework
Thanks for confirming. I notice that this PR follows the format similar to Lz4FrameEncoder
, but no other library follows the same format for Zstd.
The reason why we use this format for LZ4 codec is because lz4-java
[1] was the only one java implementation of LZ4 back in 2014. Therefore, we decided to follow the same format as LZ4BlockOutputStream
[2] to make netty implementation compatible with any other java system that may use the same library. We verify this compatibility in Lz4FrameEncoderTest
and Lz4FrameDecoderTest
.
Time flies and I see that the official LZ4 website [3] now recommends Apache Commons Compress as an interoperable java LZ4 port. Apache added support for LZ4 in 2017 [4].
Zstd has a broader scope and applicability compare to LZ4. It's an official standard [5] that defines the format of compressed frames and the frame header. The standard also defines application/zstd
HTTP media-type and zstd
content-encoding. The Zstd algorithm is a lot more used in the world for different use-cases, including HTTP.
Taking that into account, I would like to discuss with you the motivation for adding Zstd codec with a custom header that is not supported by any other non-netty system. Could you please describe your use-case and how this codec is intended to be used?
It will be highly appreciated if we can follow the official standard and implement a codec with the interoperable Zstd format. It will allow us to reuse your work in the HTTP codec and communicate with any other systems that use Zstd. The impact of contributing an interoperable version will be huge. And we will be able to use zstd-jni
[6] Zstd[Input|Output]Stream
to verify the compatibility.
[1] https://github.com/lz4/lz4-java
[2] https://github.com/lz4/lz4-java/blob/master/src/java/net/jpountz/lz4/LZ4BlockOutputStream.java
[3] https://lz4.github.io/lz4/
[4] https://github.com/apache/commons-compress/commits/master/src/main/java/org/apache/commons/compress/compressors/lz4/FramedLZ4CompressorInputStream.java
[5] https://tools.ietf.org/html/rfc8478
[6] https://github.com/luben/zstd-jni
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, so do you mean that the headers needs to be removed, or other changes are needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting to implement a header format defined by the Zstd RFC [1] instead of defining a custom header that can not be understood by any other non-netty system. We should have a strong reason to implement a custom header instead of following the RFC. If you have this reason, please provide more context about why you chose the current header format.
zstd-jni
will be very helpful to do most of the work. In netty, look at how ZlibEncoder
and ZlibDecoder
are implemented. The gzip/deflate have a similar format to Zstd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main point here that whatever ZstdEncoder
produces, other systems should be able to decompress and vice versa. For the test we can use ZstdInputStream
and ZstdOutputStream
classes from zstd-jni. See Bzip2EncoderTest
and Bzip2DecoderTest
as an examples of how to verify output of the encoder/decoder with Input/Output streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before you address other comments, let's finish this discussion and come to some agreement. Because if we decide to follow the standard it will require significant changes of the current encoder and decoder implementation and my other comments won't be relevant anymore.
// cc @normanmaurer @trustin wdyt about having a non-standard header for Zstd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skyguard1 as @idelpivnitskiy mentioned we need to follow the RFC and implement the headers the way its defined there. Otherwise it will be impossible to interopt with other implementations out in the wild
Thanks for your great effort in code review, I will take a look at it and make corresponding changes.appreciate it. |
|
||
public ZstdDecoder(boolean validateCheckSum) { | ||
this.checksum = ByteBufChecksum.wrapChecksum(new CRC32()); | ||
this.validateCheckSum = validateCheckSum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you don't need this boolean, use checksum != null
as a check
// by updating currentState, reset all numbers and checksum. | ||
compressedLength = 0; | ||
decompressedLength = 0; | ||
currentChecksum = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, also clear the blockType
and reset checksum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it
* See <a href="https://facebook.github.io/zstd">Zstandard</a>. | ||
* Please note that checksum validate is enabled by default if you use the default constructor | ||
* if you want to disabled checksum validate,please use {@link ZstdDecoder(boolean)} constructor | ||
* set to {@code false} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this note if constructors have their own javadoc, describing this behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it
private boolean validateCheckSum; | ||
|
||
/** | ||
* Same as {@link #ZstdDecoder(boolean)} with validateCheckSum = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion (similar to Lz4FrameDecoder
):
/**
* Creates a new Zstd decoder.
*
* Note that by default, validation of the checksum header in each chunk is
* DISABLED for performance improvements. If performance is less of an issue,
* or if you would prefer the safety that checksum validation brings, please
* use the {@link #ZstdDecoder(boolean)} constructor with the argument
* set to {@code true}.
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it
/** | ||
* Construct a new ZstdDecoder. | ||
* @param validateCheckSum | ||
* Whether to enable checksum verification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
/**
* Creates a new Zstd decoder.
*
* @param validateCheckSum if {@code true}, the checksum field will be validated against the actual
* uncompressed data, and if the checksums do not match, a suitable
* {@link DecompressionException} will be thrown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done it
footer.setByte(idx + TOKEN_OFFSET, (byte) (BLOCK_TYPE_NON_COMPRESSED | compressionLevel)); | ||
footer.setInt(idx + COMPRESSED_LENGTH_OFFSET, 0); | ||
footer.setInt(idx + DECOMPRESSED_LENGTH_OFFSET, 0); | ||
footer.setInt(idx + DEFAULT_CHECKSUM_OFFSET, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 params (buffer, idx, blockType, compressedLength, decompressedLength, checksum) sounds ok for the internal private method.
codec/pom.xml
Outdated
@@ -30,6 +30,7 @@ | |||
|
|||
<properties> | |||
<javaModuleName>io.netty.codec</javaModuleName> | |||
<ztsd-version>1.4.5-6</ztsd-version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This property can be removed now because the version is managed by parent pom file.
codec/src/test/java/io/netty/handler/codec/compression/ZstdDecoderTest.java
Outdated
Show resolved
Hide resolved
|
||
import static io.netty.handler.codec.compression.ZstdConstants.*; | ||
|
||
public class ZstdEncoderTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, port required use-cases to ZstdIntegrationTest
, a separate test class for ZstdEncoder
is not necessary
super(true); | ||
compressionLevel = compressionLevel(blockSize); | ||
this.blockSize = blockSize; | ||
this.checksum = ByteBufChecksum.wrapChecksum(new CRC32()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lz4XXHash32
uses XXH32
hash defined like a static constant:
private static final XXHash32 XXHASH32 = XXHashFactory.fastestInstance().hash32();
You need a similar class, but for XXHashFactory.fastestInstance().hash64()
. You can share most of the code by having an abstract class that takes a hash function in a protected
constructor.
I made some changes, the biggest problem is to implement the standard protocol format, which will have huge changes, anyone can make their own suggestion |
@skyguard1 please address comments of @idelpivnitskiy and ping me once ready for review again |
@normanmaurer,Sorry, I am trying to implement the rfc8478 specification, it is really a bit complicated, I will tell you when I finish, thank |
@skyguard1 What's the source of the complexity? I would've though the zstd library would provide most of what's needed? |
@chrisvest,I need to implement the rfc8478 specification, which has a huge modification to the original logic, and I have to do more tests to ensure that it can be universal |
@skyguard1 I would be happy to help if you have any questions or need guidance on the approach. Feel free to ask in this thread. I would propose to implement it in stages. It's fine to send multiple PRs. For example, you can split the work in this way:
* zstd format is similar to gzip. Take a look at existing
Try to dig into the Testing: do not spend time on excessive testing during the development. We can decided during the review if additional testing is necessary. As a start, use existing |
@idelpivnitskiy,I will look at the implementation of zstd in more detail, thanks for your suggestions, appreciate it |
@skyguard1 What's the status? |
@normanmaurer @skyguard1 do we have any updates on this PR? It looks like the comments have been addressed, but now 6 months later there are some merge conflicts in the pom.xml. I'm very interested in using this integration of ZSTD. |
@DinoCassowary The biggest problem now is that network packets are transmitted in segments, so it is necessary to judge the size of the data block when decoding. I have submitted an issue in |
Motivation: As discussed in netty#10422, ZstdEncoder can be added separately Modification: Add ZstdEncoder separately Result: netty supports ZSTD with ZstdEncoder Signed-off-by: xingrufei <xingrufei@sogou-inc.com> Co-authored-by: xingrufei <xingrufei@sogou-inc.com>
@skyguard1 |
This doesn't solve the issue |
Signed-off-by: xingrufei <qhdxssm@qq.com>
Signed-off-by: xingrufei <qhdxssm@qq.com>
Signed-off-by: xingrufei <qhdxssm@qq.com>
Signed-off-by: xingrufei <qhdxssm@qq.com>
Motivation: Zstandard(https://facebook.github.io/zstd/) is a high performance, high compression ratio compression algorithm,This pr is to add netty support for the zstandard algorithm,The implementation of zstandard algorithm relies on zstd-jni (https://github.com/luben/zstd-jni), which is an openSource third-party library,Apache Kafka is also using this library for message compression processing. This is the copy of #10422 Modification: Add ZstdDecoder and test case. Result: netty supports ZSTD with zstdDecoder --------- Signed-off-by: xingrufei <qhdxssm@qq.com> Co-authored-by: xingrufei <xingrufei@yl.com> Co-authored-by: Norman Maurer <norman_maurer@apple.com> Co-authored-by: Chris Vest <mr.chrisvest@gmail.com>
Motivation: Zstandard(https://facebook.github.io/zstd/) is a high performance, high compression ratio compression algorithm,This pr is to add netty support for the zstandard algorithm,The implementation of zstandard algorithm relies on zstd-jni (https://github.com/luben/zstd-jni), which is an openSource third-party library,Apache Kafka is also using this library for message compression processing. This is the copy of #10422 Modification: Add ZstdDecoder and test case. Result: netty supports ZSTD with zstdDecoder --------- Signed-off-by: xingrufei <qhdxssm@qq.com> Co-authored-by: xingrufei <xingrufei@yl.com> Co-authored-by: Norman Maurer <norman_maurer@apple.com> Co-authored-by: Chris Vest <mr.chrisvest@gmail.com>
Motivation: Zstandard(https://facebook.github.io/zstd/) is a high performance, high compression ratio compression algorithm,This pr is to add netty support for the zstandard algorithm,The implementation of zstandard algorithm relies on zstd-jni (https://github.com/luben/zstd-jni), which is an openSource third-party library,Apache Kafka is also using this library for message compression processing. This is the copy of #10422 Modification: Add ZstdDecoder and test case. Result: netty supports ZSTD with zstdDecoder Signed-off-by: xingrufei <qhdxssm@qq.com> Co-authored-by: skyguard1 <qhdxssm@qq.com> Co-authored-by: xingrufei <xingrufei@yl.com> Co-authored-by: Chris Vest <mr.chrisvest@gmail.com>
Motivation:
Zstandard(https://facebook.github.io/zstd/) is a high performance, high compression ratio compression algorithm,This pr is to add netty support for the zstandard algorithm,The implementation of zstandard algorithm relies on zstd-jni (https://github.com/luben/zstd-jni), which is an openSource third-party library,Apache Kafka is also using this library for message compression processing.Please review this pr,thk
Modification:
Add ZstdDecoder and test case.
Result:
netty supports ZSTD with zstdDecoder