-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zstd to native access #105715
Add zstd to native access #105715
Conversation
This commit makes zstd compression available to Elasticsearch. The library is pulled in through maven in jar files for each platform, then bundled in a new platform directory under lib. Access to the zstd compression/decompression is through NativeAccess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment on the build/distribution-related bits that I'm not too familiar with but this looks fantastic!
@Override | ||
public Zstd getZstd() { | ||
logger.warn("cannot compress with zstd because native access is not available"); | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When ZSTD support becomes a requirement in the future, I imagine we'd throw here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally there would be no no-op implementation, we would throw at load time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we aren't just doing that now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's contentious, so I want to save that for an isolated discussion/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like leniency, which I presumed we had all agreed is abhorrent 😉
@Override | ||
public long compress(ByteBuffer dst, int dstLen, ByteBuffer src, int srcLen, int compressionLevel) { | ||
try (Arena arena = Arena.ofConfined()) { | ||
var nativeDst = arena.allocate(dstLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes, dst may be over-allocated. I wonder if we should allocate nativeDst
with size min(dstLen, compressBound(srcLen))`. Hopefully the overhead of the additional native call would be negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should do that in Zstd.compress so it applies to both impls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe I'm misunderstanding your suggestion, but it felt to me like this should happen next to where we're allocating the native memory, so it should be in the JDK/JNA impls?
@Override | ||
public long decompress(ByteBuffer dst, int dstLen, ByteBuffer src, int srcLen) { | ||
try (Arena arena = Arena.ofConfined()) { | ||
var nativeDst = arena.allocate(dstLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here for decompression, maybe we should take advantage of the ZSTD_getFrameContentSize
API to avoid allocating more native memory than necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't getFrameContentsSize only find the size of the first frame, yet we could decompress multiple frames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compress
method that is exposed right now is guaranteed to only create a single frame: https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L150 so that should always be the case, but you have a point that we may want to decompress buffers compressed by 3rd parties in the future.
We could check if the source length is equal to ZSTD_findFrameCompressedSize
to check if there is a single frame. But at this point this is becoming complicated enough that it's probably worth deferring to a follow-up PR.
assert srcLen != 0; | ||
try (Memory nativeDst = new Memory(dstLen); | ||
Memory nativeSrc = new Memory(srcLen)) { | ||
nativeSrc.write(0, src.array(), src.position(), srcLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have asserts that the buffer is not direct, but since you're using array()
here, I guess that the buffer needs to not be read-only either. I wonder how much work that would be to make it work transparently across direct, heap and read-only byte buffers. (could be a follow-up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can make it work, it just needs the correct ByteBuffer calls to be backing agnostic.
This commit upgrades jna to 5.12.1, which supports better control over releasing native memory. relates #105715
The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates elastic#105715
The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates elastic#105715
The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates #105715
…105842) The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates elastic#105715
…#105846) The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates #105715
@rjernst I'm wondering if we need some packaging tests here. As it is, there's nothing that's actually exercising zstd in a packaged distribution. Presumably, we'll eventually add functionality that relies on zstd that will be executed via REST tests and maybe that'll be good enough? |
…105842) The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates elastic#105715
Yes, that's what I've been expecting. |
private final ByteBuffer bufferView; | ||
|
||
JdkCloseableByteBuffer(int len) { | ||
this.arena = Arena.ofShared(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with arenas, what would be the downsides of using a shared arena if e.g. a confined arena would have worked as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A confined arena restricts usage to the current thread, so the buffer could not be used in a different thread.
public int compress(ByteBuffer dst, ByteBuffer src, int level) { | ||
Objects.requireNonNull(dst, "Null destination buffer"); | ||
Objects.requireNonNull(src, "Null source buffer"); | ||
long ret = zstdLib.compress(dst, src, level); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check that both buffers are direct non-read-only buffers and fail otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have some of that checked in the impls, but here would be better.
This commit makes zstd compression available to Elasticsearch. The library is pulled in through maven in jar files for each platform, then bundled in a new platform directory under lib. Access to the zstd compression/decompression is through NativeAccess.