Add zstd to native access #105715

rjernst · 2024-02-22T00:09:54Z

This commit makes zstd compression available to Elasticsearch. The library is pulled in through maven in jar files for each platform, then bundled in a new platform directory under lib. Access to the zstd compression/decompression is through NativeAccess.

jpountz

I can't comment on the build/distribution-related bits that I'm not too familiar with but this looks fantastic!

jpountz · 2024-02-22T09:42:54Z

libs/native/src/main/java/org/elasticsearch/nativeaccess/NoopNativeAccess.java

+    @Override
+    public Zstd getZstd() {
+        logger.warn("cannot compress with zstd because native access is not available");
+        return null;


When ZSTD support becomes a requirement in the future, I imagine we'd throw here instead?

Yes, ideally there would be no no-op implementation, we would throw at load time.

Is there a reason we aren't just doing that now?

It's contentious, so I want to save that for an isolated discussion/PR.

This sounds like leniency, which I presumed we had all agreed is abhorrent 😉

jpountz · 2024-02-22T09:48:09Z

libs/native/src/main21/java/org/elasticsearch/nativeaccess/jdk/JdkZstdLibrary.java

+    @Override
+    public long compress(ByteBuffer dst, int dstLen, ByteBuffer src, int srcLen, int compressionLevel) {
+        try (Arena arena = Arena.ofConfined()) {
+            var nativeDst = arena.allocate(dstLen);


Sometimes, dst may be over-allocated. I wonder if we should allocate nativeDst with size min(dstLen, compressBound(srcLen))`. Hopefully the overhead of the additional native call would be negligible.

Perhaps we should do that in Zstd.compress so it applies to both impls?

Hmm, maybe I'm misunderstanding your suggestion, but it felt to me like this should happen next to where we're allocating the native memory, so it should be in the JDK/JNA impls?

jpountz · 2024-02-22T09:51:14Z

libs/native/src/main21/java/org/elasticsearch/nativeaccess/jdk/JdkZstdLibrary.java

+    @Override
+    public long decompress(ByteBuffer dst, int dstLen, ByteBuffer src, int srcLen) {
+        try (Arena arena = Arena.ofConfined()) {
+            var nativeDst = arena.allocate(dstLen);


Likewise here for decompression, maybe we should take advantage of the ZSTD_getFrameContentSize API to avoid allocating more native memory than necessary.

Wouldn't getFrameContentsSize only find the size of the first frame, yet we could decompress multiple frames?

The compress method that is exposed right now is guaranteed to only create a single frame: https://github.com/facebook/zstd/blob/dev/lib/zstd.h#L150 so that should always be the case, but you have a point that we may want to decompress buffers compressed by 3rd parties in the future.

We could check if the source length is equal to ZSTD_findFrameCompressedSize to check if there is a single frame. But at this point this is becoming complicated enough that it's probably worth deferring to a follow-up PR.

jpountz · 2024-02-22T09:54:35Z

libs/native/jna/src/main/java/org/elasticsearch/nativeaccess/jna/JnaZstdLibrary.java

+        assert srcLen != 0;
+        try (Memory nativeDst = new Memory(dstLen);
+             Memory nativeSrc = new Memory(srcLen)) {
+            nativeSrc.write(0, src.array(), src.position(), srcLen);


You already have asserts that the buffer is not direct, but since you're using array() here, I guess that the buffer needs to not be read-only either. I wonder how much work that would be to make it work transparently across direct, heap and read-only byte buffers. (could be a follow-up)

I think I can make it work, it just needs the correct ByteBuffer calls to be backing agnostic.

This commit upgrades jna to 5.12.1, which supports better control over releasing native memory. relates #105715

The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates elastic#105715

The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates #105715

…105842) The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates elastic#105715

…#105846) The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates #105715

mark-vieira · 2024-03-08T20:40:56Z

@rjernst I'm wondering if we need some packaging tests here. As it is, there's nothing that's actually exercising zstd in a packaged distribution. Presumably, we'll eventually add functionality that relies on zstd that will be executed via REST tests and maybe that'll be good enough?

…105842) The build handles platform specific code which may be for arm or x86. Yet there are multiple ways to describe 64bit x86, and the build converts between the two in several places. This commit consolidates on the x64 nomenclature in most places, except where necessary (eg ML still uses x86_64). relates elastic#105715

rjernst · 2024-03-08T23:34:48Z

Presumably, we'll eventually add functionality that relies on zstd that will be executed via REST tests and maybe that'll be good enough?

Yes, that's what I've been expecting.

jpountz · 2024-03-12T15:11:42Z

libs/native/src/main21/java/org/elasticsearch/nativeaccess/jdk/JdkCloseableByteBuffer.java

+    private final ByteBuffer bufferView;
+
+    JdkCloseableByteBuffer(int len) {
+        this.arena = Arena.ofShared();


I'm not too familiar with arenas, what would be the downsides of using a shared arena if e.g. a confined arena would have worked as well?

A confined arena restricts usage to the current thread, so the buffer could not be used in a different thread.

jpountz · 2024-03-12T15:13:46Z

libs/native/src/main/java/org/elasticsearch/nativeaccess/Zstd.java

+    public int compress(ByteBuffer dst, ByteBuffer src, int level) {
+        Objects.requireNonNull(dst, "Null destination buffer");
+        Objects.requireNonNull(src, "Null source buffer");
+        long ret = zstdLib.compress(dst, src, level);


Should we check that both buffers are direct non-read-only buffers and fail otherwise?

Yes, I have some of that checked in the impls, but here would be better.

Add zstd to native access

8db0b43

This commit makes zstd compression available to Elasticsearch. The library is pulled in through maven in jar files for each platform, then bundled in a new platform directory under lib. Access to the zstd compression/decompression is through NativeAccess.

rjernst added the :Core/Infra/Core Core issues without another label label Feb 22, 2024

elasticsearchmachine added the v8.14.0 label Feb 22, 2024

rjernst added 2 commits February 21, 2024 18:45

fix distribution build

f154435

remove debug and leftover

76a14cb

rjernst mentioned this pull request Feb 22, 2024

Upgrade jna to 5.12.1 #105717

Merged

jpountz reviewed Feb 22, 2024

View reviewed changes

elasticsearchmachine pushed a commit that referenced this pull request Feb 22, 2024

Upgrade jna to 5.12.1 (#105717)

5b82681

This commit upgrades jna to 5.12.1, which supports better control over releasing native memory. relates #105715

rjernst added 4 commits February 22, 2024 12:00

spotless

887d6e6

use correct arch

2990f1d

Merge branch 'main' into native/zstd

a2da8ce

better normalize platform

f4a9762

rjernst mentioned this pull request Feb 28, 2024

Standardize build distribution internals on os/architecture #105842

Merged

rjernst added 6 commits February 28, 2024 16:46

adjust x64 rewrite of libs dirs

890ca1b

fix platform dir

244c616

fix packages

522e636

better os/arch selection and fallback

3a46b48

x64 for tests

16bc338

fixup extracted libs dir

ed0d322

rjernst mentioned this pull request Feb 28, 2024

Standardize build distribution internals on os/architecture (#105842) #105846

Merged

rjernst added 2 commits February 28, 2024 22:20

Merge branch 'main' into native/zstd

1f811e2

windows too

14e374e

rjernst added 2 commits February 29, 2024 04:37

add tests for jvm options

dbd9bf5

some build tests

0454d8f

use defualt artifact and move maven config to libraries project

4232289

mark-vieira and others added 7 commits March 8, 2024 15:57

Fix distribution artifact resolution

9929d94

Merge branch 'main' into native/zstd

9bbbb3e

spotless

e8a721b

cleanup

ffcfcd8

fix build tests

96c42e4

give base a name

823d68e

Merge branch 'main' into native/zstd

3296253

jpountz reviewed Mar 12, 2024

View reviewed changes

move/add buffer checks

92d0d48

jpountz approved these changes Mar 12, 2024

View reviewed changes

rjernst added 3 commits March 12, 2024 08:25

better packaging test debugging

c31d6f5

fix library path

ecce0d2

spotless

529364a

rjernst added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 12, 2024

rjernst added 4 commits March 12, 2024 10:26

use nio

0dd4b5e

more debugging

1732a13

use warn so it's easier to find

2b0eb9b

path separator

4e253f2

rjernst removed the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 12, 2024

rjernst added 3 commits March 12, 2024 18:30

spotless

9bf0e11

Merge branch 'main' into native/zstd

a887fe6

remove debug

895b8e0

rjernst merged commit 405b88b into elastic:main Mar 13, 2024
14 checks passed

rjernst deleted the native/zstd branch March 13, 2024 16:45

jpountz mentioned this pull request Mar 13, 2024

Cut over stored fields to ZSTD for compression. #103374

Merged

rjernst mentioned this pull request Mar 13, 2024

Fix test lib path separator on windows #106333

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zstd to native access #105715

Add zstd to native access #105715

rjernst commented Feb 22, 2024

jpountz left a comment

jpountz Feb 22, 2024

rjernst Feb 22, 2024

mark-vieira Mar 7, 2024

rjernst Mar 8, 2024

mark-vieira Mar 8, 2024

jpountz Feb 22, 2024

rjernst Feb 22, 2024

jpountz Feb 22, 2024

jpountz Feb 22, 2024

rjernst Feb 22, 2024

jpountz Feb 22, 2024

jpountz Feb 22, 2024

rjernst Feb 22, 2024

mark-vieira commented Mar 8, 2024 •

edited

Loading

rjernst commented Mar 8, 2024

jpountz Mar 12, 2024

rjernst Mar 12, 2024

jpountz Mar 12, 2024

rjernst Mar 12, 2024

Add zstd to native access #105715

Add zstd to native access #105715

Conversation

rjernst commented Feb 22, 2024

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mark-vieira commented Mar 8, 2024 • edited Loading

rjernst commented Mar 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mark-vieira commented Mar 8, 2024 •

edited

Loading