Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Demote] ZStd compression from GA / LTS to experimental and release a 2.9.1 patch #9422

Closed
nknize opened this issue Aug 17, 2023 · 86 comments · Fixed by #9431 or #9658
Closed

[Demote] ZStd compression from GA / LTS to experimental and release a 2.9.1 patch #9422

nknize opened this issue Aug 17, 2023 · 86 comments · Fixed by #9431 or #9658
Labels
bug Something isn't working decision Issues requiring a decision help wanted Extra attention is needed Performance This is for any performance related enhancements or bugs Severity-Critical v2.10.0

Comments

@nknize
Copy link
Collaborator

nknize commented Aug 17, 2023

Is your feature request related to a problem? Please describe.

ZStd was introduced as an experimental compression option for Lucene indexes in #3577. This brought the implementation as a module (installed by default) under the sandbox directory. However, the feature would remain "expert" optional as it would only be installed if users built the distribution themselves and passed sandbox.enabled=true at JVM startup.

Unfortunately this code was prematurely released GA when it was migrated as a top level module a short time later in #7908 without any feature flags. This has now lead to users reporting memory leaks along with several other bugs in the zstd jni library. Additionally, Lucene has a long running discussion on the pros/cons of Zstd as an index compression option including the reasons it's not provided as a core capability. One of those reasons is the hard dependency on native compiled code, which often leads to portability issues due to glibc differences. These issues have been realized several times both in the OpenSearch bundle (e.g., see the KNN issue where users can't compile on M1 Mac) and legacy codebase .

For these reasons, we need to address the premature promotion of the zstd library as a GA / LTS feature in core, and (at minimum) release a patched bundle that fixes the critical performance issues and bugs.

Describe the solution you'd like

Quickly build and release a 2.9.1 bundle distribution with five patches.

  1. Zstd-jni memory leak fix in: Close Zstd Dictionary after execution to avoid any memory leak. #9403
  2. Add a new ZSTD_COMPRESSION_EXPERIMENTAL = "opensearch.experimental.feature.compression.zstd.enabled feature flag that is set to false by default (forcing users to opt in).
  3. Apply the ZSTD_COMPRESSION_EXPERIMENTAL feature flag both to the CodecService constructor and the CompressionProvider.getCompressors() (used for BlobStoreRepository compression).
  4. Add a DeprecationLogger message that the zstd feature will be moved to a plugin in the next release
  5. Bump the zstd library dependency from 1.5.5-3 to 1.5.5-5 Bump zstd version to 1.5.5-5 #9431 (NEEDS TO BE BACKPORTED)

In 2.10.0 (or later even) we should decide the following:

  1. Move the ZstdCodec and ZstdNoDictCodec out from being a default module into an optional location (e.g., either an optional plugin or library - note that the BlobStoreRepository compression is already in as an optional library, but its packaged and included by default so we still need to figure out how to make that optional).
  2. Whether to switch to direct memory or introduce an expert setting that gives users the option to use direct or heap memory when using ZStd compression

Describe alternatives you've considered

  1. Move ZStd codec and BlobStore compression to a module in a patch release.
  2. Revert Moving zstd out of sandbox #7908 in a 2.9.1 patch release
@nknize nknize added bug Something isn't working Severity-Critical untriaged Performance This is for any performance related enhancements or bugs v2.9.0 'Issues and PRs related to version v2.9.0' labels Aug 17, 2023
@nknize nknize changed the title [Demote] ZStd compression from GA / LTS to an experimental [Demote] ZStd compression from GA / LTS to experimental and release a 2.9.1 patch Aug 17, 2023
@bbarani
Copy link
Member

bbarani commented Aug 17, 2023

@nknize we cannot add new features in patch release (i.e 2.9.1) rather only bug fix and security patches as per SemVar guidelines. We can definitely get the bug fix in but I am not sure about adding experimental flag since that might be considered a feature, so we can target those changes only for next minor release.

@nknize
Copy link
Collaborator Author

nknize commented Aug 17, 2023

...we cannot add new features in patch release (i.e 2.9.1) ....per SemVar guidelines. ...I am not sure if adding experimental flag since that would be considered a feature,...

@bbarani Unfortunately we can't allow ZStd to remain available by default because rolling this back requires users to reindex if they chose these codecs. We have to revert this feature ASAP before users start indexing. This was mistakenly released GA, hence the patch.

@andrross
Copy link
Member

Regarding the option to revert #7908, I just want to note that will leave any users who indexed data with zstd using 2.9.0 in a bad spot. Their only option would be to reindex that data with 2.9.0 before upgrading to 2.9.1, since the 2.9.1 release will not contain the sandbox plugin functionality.

@nknize
Copy link
Collaborator Author

nknize commented Aug 17, 2023

Great point @andrross! I'm adding context from follow on discussion from slack just so we don't lose it.

Barani Bikshandi (Barani): Yeah but can this be remediated with "revert" rather than adding experimental flag for 2.9.1? Basically remove support for zstd in 2.9.1? (edited)

I added this as an alternative to discuss.

@backslasht
Copy link
Contributor

One of those reasons is the hard dependency on native compiled code, which often leads to portability issues due to glibc differences

@nknize - What is the general guideline for any native dependency? Are you saying OpenSearch will never support native dependencies going forward?

Move the ZstdCodec and ZstdNoDictCodec out from being a default module into an optional location (e.g., either an optional plugin or library - note that the BlobStoreRepository compression is already in as an optional library, but its packaged and included by default so we still need to figure out how to make that optional).

What is the intent of this? By moving to optional are you suggesting users to use Zstd at their own risk? Zstd is providing 30% reduction in storage size and latencies as good as best_speed. What is the alternative to this?

@nknize
Copy link
Collaborator Author

nknize commented Aug 18, 2023

Are you saying OpenSearch will never support native dependencies going forward?

As plugins, sure. But likely not as default modules. Plugins are just like modules except users have to opt in by installing them.

By moving to optional are you suggesting users to use Zstd at their own risk?

Depends on how you define "risk". ldd errors due to libc issues is certainly a "risk". An alternative to the jni implementation would be to use the pure java solution. Then we could include it as a module without portability risks.

Zstd is providing 30% reduction in storage size and latencies as good as best_speed.

That's only part of the picture. Retrieval time per 10k docs is nearly five times slower than BEST_SPEED. So it's not without its tradeoffs.

@sarthakaggarwal97
Copy link
Contributor

sarthakaggarwal97 commented Aug 18, 2023

@nknize @backslasht sharing some search numbers from my runs with NYC Taxis

Cluster Configuration:

  1. Server Setup
    a. One data node, r5.2xlarge backed by EBS GP3 volume of size 100gb.
    b. Three master nodes, r5.xlarge instance type

  2. Benchmarking Setup
    Benchmark was run on a single node of type c5.4xlarge backed by EBS GP3 volume of size 500gb.
    a. Number of clients: 16 and bulk size: 1024.
    b. Workload: https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/nyc_taxis

  3. Index Setup
    a. Number of shards: 1
    b. Number of replicas: 0

  4. Benchmarks

image

@backslasht
Copy link
Contributor

@sarthakaggarwal97 - Can you add details about the cluster configuration for the run?

@backslasht
Copy link
Contributor

Depends on how you define "risk". ldd errors due to libc issues is certainly a "risk". An alternative to the jni implementation would be to use the pure java solution. Then we could include it as a module without portability risks.

Fair point!

@sarthakaggarwal97
Copy link
Contributor

Can you add details about the cluster configuration for the run?

I've updated it in my previous comment alongside the numbers, thanks @backslasht

@reta
Copy link
Collaborator

reta commented Aug 18, 2023

@sarthakaggarwal97 sadly the latency charts do not reflect memory leaks (especially the native one)

An alternative to the jni implementation would be to use the pure java solution.

@nknize afaik the pure Java implementation exists and was evaluated with terrible performance (comparing to ZSTD-jni)

@nknize
Copy link
Collaborator Author

nknize commented Aug 18, 2023

...sharing some search numbers from my runs with NYC Taxis

@sarthakaggarwal97 can you add two things to your nyc_taxi benchmark run:

  1. this term -> date_histogram -> top_hits agg in #1647?
  2. "store": true to the vendor_name field

The reasoning is: regressions have historically occurred when decompressing stored fields and I don't see any stored fields in the nyc_taxis workload. We should make this part of the regular benchmark to prevent us from missing another regression.

...afaik the pure Java implementation exists and was evaluated with terrible performance (comparing to ZSTD-jni)

@reta Do you have that benchmark handy? I haven't dug around for it and the AirCompressor README touts "...typically 10-40% faster than the JNI wrapper for the native libraries." The only thing I quickly dug up was the comment justifying the purpose of the library, which aligns with our purpose of avoiding strict native dependencies.

@sarthakaggarwal97 It's also worth running luceneutil's StoredFieldsBenchmark with the ZStdCodec. We could do this in a separate repo and even pull in the AirCompressor pure java implementation for comparison.

@reta
Copy link
Collaborator

reta commented Aug 18, 2023

@reta Do you have that benchmark handy?

Hm ... I definitely seen somewhere, there are mentions here apache/lucene#9784 (comment), and here #3354 for LZ4 (not applicable to ZSTD).

There are benchmarks available in the repo, I will run them: https://github.com/airlift/aircompressor/blob/master/src/test/java/io/airlift/compress/benchmark/CompressionBenchmark.java

@mgodwan
Copy link
Member

mgodwan commented Aug 18, 2023

The reasoning is: regressions have #1647 (comment) when decompressing stored fields and I don't see any stored fields in the nyc_taxis workload. We should make this part of the regular benchmark to prevent us from missing another regression.

@nknize Shouldn't _source cover the stored field as it is returned for the configured default/range queries in nyc taxis?

@reta reta reopened this Aug 18, 2023
@reta
Copy link
Collaborator

reta commented Aug 18, 2023

Sorry this issue was closed automatically after #9431

@nknize
Copy link
Collaborator Author

nknize commented Aug 18, 2023

Shouldn't _source cover the stored field as it is returned for the configured default/range queries in nyc taxis?

@mgodwan For queries, _source is only fetched for the first 10 results. TopHitsAggregator is particularly of interest (and adversarial) because it executes the fetchPhase for topDocs.size() when the aggregation is built. This is why that regression linked above wasn't caught before the release, the benchmark workloads didn't test for these potentially "evil" aggregations.

@reta
Copy link
Collaborator

reta commented Aug 18, 2023

@nknize I've run aircompressor benchmarks locally on my machine ( i7-10750H × 12, 64Gb, Linux):

Here are the compression mem throughput:

  compress    airlift_zstd            canterbury/alice29.txt          56,746   101.3MB/s ±   796.4kB/s ( 0.77%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/asyoulik.txt         50,753    76.0MB/s ±  9315.6kB/s (11.97%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/cp.html               8,566    93.3MB/s ±    15.5MB/s (16.64%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/fields.c              3,427   102.8MB/s ±    25.6MB/s (24.94%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/grammar.lsp           1,327    85.0MB/s ±    12.5MB/s (14.73%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/kennedy.xls         112,690   184.5MB/s ±  5800.4kB/s ( 3.07%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/lcet10.txt          140,687   104.2MB/s ±  3184.5kB/s ( 2.99%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/plrabn12.txt        191,230    79.8MB/s ±  2585.3kB/s ( 3.16%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/ptt5                 54,581   262.0MB/s ±    20.9MB/s ( 7.97%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/sum                  13,408    99.3MB/s ±  4520.0kB/s ( 4.45%) (N = 30, α = 99.9%)
  compress    airlift_zstd            canterbury/xargs.1               1,838    92.5MB/s ±  6027.6kB/s ( 6.36%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/dickens              3,666,832    99.5MB/s ±  2951.9kB/s ( 2.90%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/mozilla             18,577,375   124.3MB/s ±  7819.5kB/s ( 6.14%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/mr                   3,560,793   115.2MB/s ±  4011.2kB/s ( 3.40%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/nci                  2,889,641   400.0MB/s ±    15.0MB/s ( 3.75%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/ooffice              3,147,867    92.3MB/s ±  8095.0kB/s ( 8.56%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/osdb                 3,515,524   131.7MB/s ±  4193.8kB/s ( 3.11%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/reymont              1,958,308   139.9MB/s ±  1997.9kB/s ( 1.39%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/samba                5,097,907   187.4MB/s ±  7248.4kB/s ( 3.78%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/sao                  5,591,044    62.2MB/s ±    10.1MB/s (16.25%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/webster             12,108,729   105.5MB/s ±  9211.8kB/s ( 8.53%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/x-ray                6,229,560    57.9MB/s ±  7637.8kB/s (12.87%) (N = 30, α = 99.9%)
  compress    airlift_zstd            silesia/xml                    641,486   240.9MB/s ±    21.1MB/s ( 8.74%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/bib                     37,329    98.0MB/s ± 10154.2kB/s (10.12%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/book1                  307,645    85.5MB/s ±  6029.9kB/s ( 6.89%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/book2                  205,814   106.1MB/s ±  5918.6kB/s ( 5.45%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/geo                     69,249    61.4MB/s ±  1123.1kB/s ( 1.79%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/news                   139,728    89.0MB/s ±  8312.1kB/s ( 9.12%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/obj1                    10,786    98.6MB/s ±  4930.9kB/s ( 4.88%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/obj2                    83,275   116.6MB/s ±  3592.6kB/s ( 3.01%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/paper1                  19,694   106.4MB/s ±  3150.5kB/s ( 2.89%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/paper2                  30,900    88.2MB/s ±  8110.2kB/s ( 8.98%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/paper3                  18,932    90.5MB/s ±  5466.5kB/s ( 5.90%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/paper4                   5,760    96.8MB/s ±  1023.9kB/s ( 1.03%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/paper5                   5,246    98.5MB/s ±  3130.5kB/s ( 3.10%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/paper6                  14,268   105.9MB/s ±  2828.0kB/s ( 2.61%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/pic                     54,581   315.1MB/s ±  3718.4kB/s ( 1.15%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/progc                   14,421    94.3MB/s ±  8889.6kB/s ( 9.20%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/progl                   17,877   147.8MB/s ±    10.5MB/s ( 7.12%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/progp                   12,362   160.3MB/s ±  2177.4kB/s ( 1.33%) (N = 30, α = 99.9%)
  compress    airlift_zstd            calgary/trans                   20,790   186.2MB/s ±  3830.5kB/s ( 2.01%) (N = 30, α = 99.9%)
  compress    airlift_zstd            artificial/a.txt                    14  1050.6kB/s ±    38.1kB/s ( 3.63%) (N = 30, α = 99.9%)
  compress    airlift_zstd            artificial/aaa.txt                  26  1813.0MB/s ±    88.1MB/s ( 4.86%) (N = 30, α = 99.9%)
  compress    airlift_zstd            artificial/alphabet.txt             50  1797.3MB/s ±    58.5MB/s ( 3.25%) (N = 30, α = 99.9%)
  compress    airlift_zstd            artificial/random.txt           75,421   317.3MB/s ±  3277.7kB/s ( 1.01%) (N = 30, α = 99.9%)
  compress    airlift_zstd            artificial/uniform_ascii.bin        8,842   226.9MB/s ±    20.2MB/s ( 8.91%) (N = 30, α = 99.9%)
  compress    airlift_zstd            large/bible.txt              1,183,732   129.2MB/s ±  6002.6kB/s ( 4.54%) (N = 30, α = 99.9%)
  compress    airlift_zstd            large/E.coli                 1,413,593   142.7MB/s ±  2199.7kB/s ( 1.51%) (N = 30, α = 99.9%)
  compress    airlift_zstd            large/world192.txt             659,802   142.2MB/s ±  3032.7kB/s ( 2.08%) (N = 30, α = 99.9%)
  compress    airlift_zstd            geo.protodata                   14,096   344.6MB/s ±    14.6MB/s ( 4.24%) (N = 30, α = 99.9%)
  compress    airlift_zstd            house.jpg                      126,974   259.0MB/s ±  9267.6kB/s ( 3.49%) (N = 30, α = 99.9%)
  compress    airlift_zstd            html                            14,928   276.9MB/s ±  8902.3kB/s ( 3.14%) (N = 30, α = 99.9%)
  compress    airlift_zstd            kppkn.gtb                       41,647   152.9MB/s ±  8482.8kB/s ( 5.42%) (N = 30, α = 99.9%)
  compress    airlift_zstd            mapreduce-osdi-1.pdf            75,523   351.6MB/s ±  5871.6kB/s ( 1.63%) (N = 30, α = 99.9%)
  compress    airlift_zstd            urls.10K                       185,565   146.7MB/s ±  1657.5kB/s ( 1.10%) (N = 30, α = 99.9%)

vs

  compress    zstd_jni                canterbury/alice29.txt          56,271   160.6MB/s ±  3078.0kB/s ( 1.87%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/asyoulik.txt         50,363   154.6MB/s ±  1365.1kB/s ( 0.86%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/cp.html               8,465   232.3MB/s ±  8557.1kB/s ( 3.60%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/fields.c              3,379   280.2MB/s ±    10.6MB/s ( 3.79%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/grammar.lsp           1,290   264.9MB/s ±  4227.8kB/s ( 1.56%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/kennedy.xls         111,742   415.8MB/s ±  6385.0kB/s ( 1.50%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/lcet10.txt          139,324   166.4MB/s ±  6885.7kB/s ( 4.04%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/plrabn12.txt        190,276   115.1MB/s ±    15.7MB/s (13.62%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/ptt5                 54,446   521.6MB/s ±    44.8MB/s ( 8.59%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/sum                  13,375   228.5MB/s ±  6798.6kB/s ( 2.91%) (N = 30, α = 99.9%)
  compress    zstd_jni                canterbury/xargs.1               1,800   231.7MB/s ±  4311.6kB/s ( 1.82%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/dickens              3,631,607   164.6MB/s ±  1239.1kB/s ( 0.74%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/mozilla             18,487,478   254.5MB/s ±  8180.9kB/s ( 3.14%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/mr                   3,547,800   194.6MB/s ±  2049.6kB/s ( 1.03%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/nci                  2,843,995   690.5MB/s ±  7290.3kB/s ( 1.03%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/ooffice              3,143,604   153.3MB/s ±  8214.5kB/s ( 5.23%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/osdb                 3,506,444   221.6MB/s ±  7585.6kB/s ( 3.34%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/reymont              1,942,004   200.3MB/s ±  9344.7kB/s ( 4.56%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/samba                4,976,446   309.1MB/s ±    30.8MB/s ( 9.97%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/sao                  5,551,154   123.5MB/s ±    14.9MB/s (12.10%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/webster             11,981,183   189.5MB/s ±  7174.5kB/s ( 3.70%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/x-ray                6,085,291   124.5MB/s ±  2117.3kB/s ( 1.66%) (N = 30, α = 99.9%)
  compress    zstd_jni                silesia/xml                    639,134   502.7MB/s ±  9095.5kB/s ( 1.77%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/bib                     37,046   192.5MB/s ±  4947.9kB/s ( 2.51%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/book1                  305,543   144.0MB/s ±  3083.5kB/s ( 2.09%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/book2                  203,941   178.5MB/s ±  4210.9kB/s ( 2.30%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/geo                     69,219   114.7MB/s ±  4773.0kB/s ( 4.06%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/news                   138,021   166.0MB/s ±  5224.0kB/s ( 3.07%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/obj1                    10,772   198.7MB/s ±  7577.9kB/s ( 3.72%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/obj2                    83,359   182.4MB/s ±  5750.3kB/s ( 3.08%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/paper1                  19,511   176.0MB/s ±  5514.1kB/s ( 3.06%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/paper2                  30,618   168.0MB/s ±  3833.8kB/s ( 2.23%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/paper3                  18,736   151.9MB/s ±  6142.8kB/s ( 3.95%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/paper4                   5,679   184.6MB/s ±  4219.1kB/s ( 2.23%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/paper5                   5,152   195.2MB/s ±  2426.9kB/s ( 1.21%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/paper6                  14,029   185.4MB/s ±  5817.8kB/s ( 3.07%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/pic                     54,446   576.1MB/s ±  7905.9kB/s ( 1.34%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/progc                   14,167   194.4MB/s ±    11.7MB/s ( 6.02%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/progl                   17,581   271.3MB/s ±  8018.9kB/s ( 2.89%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/progp                   12,149   289.7MB/s ±  3417.5kB/s ( 1.15%) (N = 30, α = 99.9%)
  compress    zstd_jni                calgary/trans                   20,595   331.0MB/s ±  3422.0kB/s ( 1.01%) (N = 30, α = 99.9%)
  compress    zstd_jni                artificial/a.txt                    10  1307.6kB/s ±    38.1kB/s ( 2.91%) (N = 30, α = 99.9%)
  compress    zstd_jni                artificial/aaa.txt                  22  4320.9MB/s ±   389.5MB/s ( 9.01%) (N = 30, α = 99.9%)
  compress    zstd_jni                artificial/alphabet.txt             46  5031.7MB/s ±   358.4MB/s ( 7.12%) (N = 30, α = 99.9%)
  compress    zstd_jni                artificial/random.txt           75,048   979.2MB/s ±    40.3MB/s ( 4.11%) (N = 30, α = 99.9%)
  compress    zstd_jni                artificial/uniform_ascii.bin        8,838   614.6MB/s ±  4906.5kB/s ( 0.78%) (N = 30, α = 99.9%)
  compress    zstd_jni                large/bible.txt              1,173,315   214.6MB/s ±  2139.8kB/s ( 0.97%) (N = 30, α = 99.9%)
  compress    zstd_jni                large/E.coli                 1,392,186   208.4MB/s ±  4167.2kB/s ( 1.95%) (N = 30, α = 99.9%)
  compress    zstd_jni                large/world192.txt             650,815   217.8MB/s ±  6117.0kB/s ( 2.74%) (N = 30, α = 99.9%)
  compress    zstd_jni                geo.protodata                   14,079   682.8MB/s ±    29.8MB/s ( 4.37%) (N = 30, α = 99.9%)
  compress    zstd_jni                house.jpg                      126,970   881.7MB/s ±    28.2MB/s ( 3.20%) (N = 30, α = 99.9%)
  compress    zstd_jni                html                            14,798   512.1MB/s ±    16.3MB/s ( 3.18%) (N = 30, α = 99.9%)
  compress    zstd_jni                kppkn.gtb                       40,850   258.4MB/s ±    15.3MB/s ( 5.94%) (N = 30, α = 99.9%)
  compress    zstd_jni                mapreduce-osdi-1.pdf            75,594   515.8MB/s ±    35.0MB/s ( 6.78%) (N = 30, α = 99.9%)
  compress    zstd_jni                urls.10K                       183,492   235.6MB/s ±  5904.0kB/s ( 2.45%) (N = 30, α = 99.9%)

Same for decompression:

  decompress  airlift_zstd            canterbury/alice29.txt          56,746   456.4MB/s ±    21.9MB/s ( 4.80%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/asyoulik.txt         50,753   490.3MB/s ±    21.7MB/s ( 4.43%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/cp.html               8,566   634.3MB/s ±  4931.3kB/s ( 0.76%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/fields.c              3,427   551.0MB/s ±    13.6MB/s ( 2.46%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/grammar.lsp           1,327   405.1MB/s ±    11.3MB/s ( 2.79%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/kennedy.xls         112,690   663.5MB/s ±  4931.5kB/s ( 0.73%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/lcet10.txt          140,687   559.4MB/s ±    24.5MB/s ( 4.38%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/plrabn12.txt        191,230   481.3MB/s ±    15.7MB/s ( 3.27%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/ptt5                 54,581  1135.9MB/s ±    16.2MB/s ( 1.43%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/sum                  13,408   590.6MB/s ±    21.0MB/s ( 3.56%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            canterbury/xargs.1               1,838   372.8MB/s ±    13.3MB/s ( 3.56%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/dickens              3,666,832   503.4MB/s ±    12.1MB/s ( 2.40%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/mozilla             18,577,375   562.5MB/s ±    18.5MB/s ( 3.29%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/mr                   3,560,793   525.6MB/s ±    29.3MB/s ( 5.58%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/nci                  2,889,641  1141.0MB/s ±    42.9MB/s ( 3.76%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/ooffice              3,147,867   404.2MB/s ±    22.0MB/s ( 5.45%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/osdb                 3,515,524   659.0MB/s ±    19.0MB/s ( 2.88%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/reymont              1,958,308   593.1MB/s ±    26.1MB/s ( 4.40%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/samba                5,097,907   755.9MB/s ±    25.1MB/s ( 3.32%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/sao                  5,591,044   430.6MB/s ±    17.3MB/s ( 4.02%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/webster             12,108,729   564.4MB/s ±    30.1MB/s ( 5.33%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/x-ray                6,229,560   374.6MB/s ±    15.2MB/s ( 4.06%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            silesia/xml                    641,486  1064.9MB/s ±    27.1MB/s ( 2.55%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/bib                     37,329   606.5MB/s ±    12.7MB/s ( 2.09%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/book1                  307,645   468.3MB/s ±    15.2MB/s ( 3.25%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/book2                  205,814   554.2MB/s ±    20.1MB/s ( 3.62%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/geo                     69,249   420.8MB/s ±    24.7MB/s ( 5.88%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/news                   139,728   580.7MB/s ±  4045.2kB/s ( 0.68%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/obj1                    10,786   512.7MB/s ±    15.8MB/s ( 3.08%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/obj2                    83,275   563.4MB/s ±  1953.8kB/s ( 0.34%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/paper1                  19,694   573.3MB/s ±  3471.3kB/s ( 0.59%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/paper2                  30,900   534.0MB/s ±    21.7MB/s ( 4.07%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/paper3                  18,932   515.5MB/s ±    11.8MB/s ( 2.30%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/paper4                   5,760   437.3MB/s ±  3951.0kB/s ( 0.88%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/paper5                   5,246   433.8MB/s ±  7494.2kB/s ( 1.69%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/paper6                  14,268   534.5MB/s ±    19.4MB/s ( 3.63%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/pic                     54,581  1140.7MB/s ±    14.2MB/s ( 1.24%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/progc                   14,421   584.6MB/s ±  5636.1kB/s ( 0.94%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/progl                   17,877   693.9MB/s ±    26.3MB/s ( 3.79%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/progp                   12,362   762.3MB/s ±    13.3MB/s ( 1.74%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            calgary/trans                   20,790   824.6MB/s ±  9032.4kB/s ( 1.07%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            artificial/a.txt                    14    42.3MB/s ±  1725.0kB/s ( 3.99%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            artificial/aaa.txt                  26  4627.3MB/s ±    69.2MB/s ( 1.50%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            artificial/alphabet.txt             50  4338.0MB/s ±    15.6MB/s ( 0.36%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            artificial/random.txt           75,421   494.6MB/s ±    21.7MB/s ( 4.38%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            artificial/uniform_ascii.bin        8,842   504.1MB/s ±  5961.5kB/s ( 1.15%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            large/bible.txt              1,183,732   620.7MB/s ±  6002.5kB/s ( 0.94%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            large/E.coli                 1,413,593   525.6MB/s ±    15.6MB/s ( 2.96%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            large/world192.txt             659,802   677.7MB/s ±    12.6MB/s ( 1.86%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            large/bible.txt              1,183,732   620.7MB/s ±  6002.5kB/s ( 0.94%) (N = 30, α = 99.9%) 
  decompress  airlift_zstd            large/E.coli                 1,413,593   525.6MB/s ±    15.6MB/s ( 2.96%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            large/world192.txt             659,802   677.7MB/s ±    12.6MB/s ( 1.86%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            geo.protodata                   14,096  1342.8MB/s ±  8880.6kB/s ( 0.65%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            house.jpg                      126,974  9324.2MB/s ±   482.0MB/s ( 5.17%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            html                            14,928  1163.7MB/s ±    90.9MB/s ( 7.81%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            kppkn.gtb                       41,647   536.1MB/s ±    45.3MB/s ( 8.45%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            mapreduce-osdi-1.pdf            75,523   988.7MB/s ±   164.2MB/s (16.61%) (N = 30, α = 99.9%)
  decompress  airlift_zstd            urls.10K                       185,565   538.9MB/s ±   115.6MB/s (21.45%) (N = 30, α = 99.9%)

vs

  decompress  zstd_jni                canterbury/alice29.txt          56,271   855.6MB/s ±    48.4MB/s ( 5.66%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/asyoulik.txt         50,363  1019.2MB/s ±    32.6MB/s ( 3.20%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/cp.html               8,465  1122.5MB/s ±    46.1MB/s ( 4.11%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/fields.c              3,379   958.6MB/s ±    23.5MB/s ( 2.45%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/grammar.lsp           1,290   671.6MB/s ±    14.1MB/s ( 2.10%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/kennedy.xls         111,742  1150.7MB/s ±    46.3MB/s ( 4.02%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/lcet10.txt          139,324  1120.4MB/s ±    32.3MB/s ( 2.88%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/plrabn12.txt        190,276   930.2MB/s ±    31.1MB/s ( 3.34%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/ptt5                 54,446  2138.5MB/s ±    52.5MB/s ( 2.45%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/sum                  13,375  1273.0MB/s ±  9259.5kB/s ( 0.71%) (N = 30, α = 99.9%)
  decompress  zstd_jni                canterbury/xargs.1               1,800   682.0MB/s ± 10101.9kB/s ( 1.45%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/dickens              3,631,607   882.2MB/s ±    41.8MB/s ( 4.74%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/mozilla             18,487,478  1220.1MB/s ±    46.6MB/s ( 3.82%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/mr                   3,547,800  1101.9MB/s ±    26.3MB/s ( 2.38%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/nci                  2,843,995  1832.1MB/s ±    98.4MB/s ( 5.37%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/ooffice              3,143,604   905.3MB/s ±    44.9MB/s ( 4.96%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/osdb                 3,506,444  1351.4MB/s ±    93.7MB/s ( 6.93%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/reymont              1,942,004  1040.3MB/s ±    49.9MB/s ( 4.80%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/samba                4,976,446  1717.6MB/s ±    15.4MB/s ( 0.90%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/sao                  5,551,154  1019.3MB/s ±  3488.9kB/s ( 0.33%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/webster             11,981,183  1034.2MB/s ±    39.7MB/s ( 3.83%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/x-ray                6,085,291   904.8MB/s ±  8415.4kB/s ( 0.91%) (N = 30, α = 99.9%)
  decompress  zstd_jni                silesia/xml                    639,134  2177.3MB/s ±    11.8MB/s ( 0.54%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/bib                     37,046  1164.0MB/s ±    41.6MB/s ( 3.57%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/book1                  305,543   990.4MB/s ±  9123.6kB/s ( 0.90%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/book2                  203,941  1133.0MB/s ±  5780.3kB/s ( 0.50%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/geo                     69,219  1149.4MB/s ±    32.7MB/s ( 2.85%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/news                   138,021  1220.7MB/s ±  7415.9kB/s ( 0.59%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/obj1                    10,772  1274.9MB/s ±  4696.8kB/s ( 0.36%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/obj2                    83,359  1008.0MB/s ±    39.7MB/s ( 3.94%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/paper1                  19,511  1042.7MB/s ±    66.6MB/s ( 6.39%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/paper2                  30,618   910.4MB/s ±   124.5MB/s (13.67%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/paper3                  18,736   912.9MB/s ±   108.4MB/s (11.87%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/paper4                   5,679   728.4MB/s ±    70.5MB/s ( 9.67%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/paper5                   5,152   758.3MB/s ±    65.2MB/s ( 8.60%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/paper6                  14,029  1112.4MB/s ±  7842.4kB/s ( 0.69%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/pic                     54,446  1922.9MB/s ±   206.2MB/s (10.72%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/progc                   14,167   664.5MB/s ±    71.1MB/s (10.70%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/progl                   17,581   987.9MB/s ±   124.1MB/s (12.56%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/progp                   12,149  1029.5MB/s ±   129.4MB/s (12.57%) (N = 30, α = 99.9%)
  decompress  zstd_jni                calgary/trans                   20,595  1297.8MB/s ±   145.4MB/s (11.20%) (N = 30, α = 99.9%)
  decompress  zstd_jni                artificial/a.txt                    10  1952.4kB/s ±   228.0kB/s (11.68%) (N = 30, α = 99.9%)
  decompress  zstd_jni                artificial/aaa.txt                  22  6132.1MB/s ±   174.4MB/s ( 2.84%) (N = 30, α = 99.9%)
  decompress  zstd_jni                artificial/alphabet.txt             46  3642.8MB/s ±   113.3MB/s ( 3.11%) (N = 30, α = 99.9%)
  decompress  zstd_jni                artificial/random.txt           75,048  1281.0MB/s ±    86.5MB/s ( 6.76%) (N = 30, α = 99.9%)
  decompress  zstd_jni                artificial/uniform_ascii.bin        8,838  1206.7MB/s ±    45.6MB/s ( 3.78%) (N = 30, α = 99.9%)
  decompress  zstd_jni                large/bible.txt              1,173,315  1041.1MB/s ±    49.5MB/s ( 4.75%) (N = 30, α = 99.9%)
  decompress  zstd_jni                large/E.coli                 1,392,186   995.2MB/s ±    37.7MB/s ( 3.79%) (N = 30, α = 99.9%)
  decompress  zstd_jni                large/world192.txt             650,815  1241.6MB/s ±    56.8MB/s ( 4.57%) (N = 30, α = 99.9%)
  decompress  zstd_jni                geo.protodata                   14,079  2804.8MB/s ±   107.9MB/s ( 3.85%) (N = 30, α = 99.9%)
  decompress  zstd_jni                house.jpg                      126,970    30.9GB/s ±  1466.7MB/s ( 4.64%) (N = 30, α = 99.9%)
  decompress  zstd_jni                html                            14,798  2444.8MB/s ±    78.2MB/s ( 3.20%) (N = 30, α = 99.9%)
  decompress  zstd_jni                kppkn.gtb                       40,850   948.1MB/s ±    52.7MB/s ( 5.56%) (N = 30, α = 99.9%)
  decompress  zstd_jni                mapreduce-osdi-1.pdf            75,594  6662.2MB/s ±   108.4MB/s ( 1.63%) (N = 30, α = 99.9%)
  decompress  zstd_jni                urls.10K                       183,492  1503.3MB/s ±    53.2MB/s ( 3.54%) (N = 30, α = 99.9%)

As per this numbers, aircompressor looks significantly slower.

@nknize
Copy link
Collaborator Author

nknize commented Aug 18, 2023

@reta what is the fourth column?

@andrross
Copy link
Member

andrross commented Aug 31, 2023

That is the module, right?

@reta The differences are:

  • modules are installed by default in the min distribution, in my proposal the zstd plugin would not be.
  • modules are in fact not removable (please correct me if I'm wrong...I just did a test on a min install by running sudo ./bin/opensearch-plugin remove lang-painless and it failed)

@backslasht
Copy link
Contributor

How about the following as a potential compromise position: move the ZSTD implementation to a plugin, and also update the distribution build to install the plugin by default in our distribution artifacts. This has the following benefits:

* Mitigates concerns around portability and developer pain of the min distribution, as it continues to be pure Java

* Mitigates concerns of breaking users have starting using ZSTD as the distribution continues to behave identically to the 2.9 release

* Adds the option to _uninstall_ the plugin from any installation where the native binding is problematic

We would make this change and ship it in the upcoming 2.10 release.

+1. Sounds good to me.

@reta
Copy link
Collaborator

reta commented Aug 31, 2023

@reta The differences are:

Got it, thanks @andrross for highlighting these subtle diffs, 👍 to follow your plan

@dblock
Copy link
Member

dblock commented Aug 31, 2023

update the distribution build to install the plugin by default in our distribution artifacts

this is a very smart idea, @andrross, solves for semver for the default distribution with plugins at least

@anastead
Copy link

anastead commented Sep 1, 2023

Thank you @andrross , much appreciated

@CEHENKLE
Copy link
Member

CEHENKLE commented Sep 1, 2023

@reta The differences are:

Got it, thanks @andrross for highlighting these subtle diffs, 👍 to follow your plan

Ship it from me too :)

@nknize
Copy link
Collaborator Author

nknize commented Sep 1, 2023

I'm just catching up on this after being out. In the spirit of full transparency to the community there has been a lot of back channeled and closed door discussions around this topic that should really take place in the open for traceability and community involvement.

With that I have a few comments and questions:

modules are in fact not removable...

They are not. Beating a dead horse described and defined in #5910, modules are installed by default and cannot be uninstalled.

modules are installed by default in the min distribution, in my proposal they would not be.

?? I'm confused by this. Are you proposing switching the features in the modules/ directory to no longer be installed by default? I don't think you are but this sentence seems to indicate so? If that's the case, strong -1 for that

...update the distribution build to install the plugin by default

It appears, per @dblock comment, that somewhere in the bundle assemble workflow all downstream OpenSearch (ODFE) developed plugins (external repos) are in fact installed by default. Looks like that's a relic of the original ODFE days. IMO that's a strange decision with interesting (and potentially harmful) side effects... but that's a separate conversation.

With respect to this issue +1 to move zstd compression to a plugin in 2.10 per the original proposal.

However, it sounds like you're suggesting we update the bundle build of the "ecosystem" artifacts to install the new core custom-compression plugin by default. Sure, -0 from me to not block that if that's what folks want to do (it achieves the loosely argued semver comments). But I have to ask, if the downstream repo plugins get installed by default in the bundle anyway, why not just move the entire zstd implementation to a new downstream custom-codecs repository where we could explore snappy and other options separately? Or move it into one of the other existing repos like common-util that other repos already take a dependency on and doesn't pollute the core with the native linked code? It accomplishes the same thing. I'm sure FUD just went up here with this comment of mine so I'll remind that I'm -0 on this part of the proposal.

@andrross
Copy link
Member

andrross commented Sep 1, 2023

modules are installed by default in the min distribution, in my proposal they would not be.

?? I'm confused by this. Are you proposing switching the features in the modules/ directory to no longer be installed by default? I don't think you are but this sentence seems to indicate so? If that's the case, strong -1 for that

@nknize Wow this was horrible phrasing/typo and I'm absolutely not proposing to change the behavior of modules. I have updated the comment in place as well, but what I meant to say is that in my proposal the zstd compression plugin would not be installed by default in the min distribution, which it would be if it were made into a module.

@nknize
Copy link
Collaborator Author

nknize commented Sep 1, 2023

...I'm absolutely not proposing to change the behavior of modules.

Phew. +1

Just going to re-iterate my suggestion ...if the downstream repo plugins get installed by default in the bundle anyway, why not just move the entire zstd implementation to a new downstream custom-codecs repository where we could explore snappy and other options separately?. To me this is emerging as a more attractive option per the plugin PR discussion. I'd suggest a new downstream repo called custom-compression, or just put it in common-utils. This way users can load ZStd, snappy, and any other custom compressions they want through SPI. That was the intention of my Refactor Compressors for Extensibility PR anyway.

Update: for sake of time it would be quicker to move it to common-util. This way we wouldn't have to wait for a new repo to be approved and created by the "Admin" group.

@andrross
Copy link
Member

andrross commented Sep 1, 2023

move the entire zstd implementation to a new downstream ... repository

@nknize I like this idea. I know you don't love the "install plugin in distribution by default" behavior but it does allow us to move plugins around without changing the end user experience.

Update: just saw the comment about moving to common-utils. I'm good with that too. I like keeping the OpenSearch repo pure Java for developer experience reasons. Also I think we are free to refactor and move things around in the future without changing the end user experience as cited above.

@reta
Copy link
Collaborator

reta commented Sep 1, 2023

+1 to moving out of core to separate repo custom-codecs, for compression, bundling into common-utils may be not a good idea (this is the project to provide some common utilities for plugins), the compression is useful in plugins, extensions and in core as well, custom-compression as a separate repo is better I think

@backslasht
Copy link
Contributor

bundling into common-utils may be not a good idea

+1, I agree lets not pollute common-utils just because it is easier to move it there.

custom-compression as a separate repo is better I think

This can be different plugins for different compressions and the users are free to choose what they like (zstd, snappy). Not sure, if bundling everything together is a good idea.

@nknize
Copy link
Collaborator Author

nknize commented Sep 3, 2023

+1, I agree lets not pollute common-utils just because it is easier to move it there.

custom-compression as a separate repo is better I think

I agree with this in the long term. The problem I worry about is the time it will take to get a new repo requested -> approved -> created. @hyandell any idea how long this will take? I know it will have to go through trademark checks and all that. We have feature release scheduled for 2.10 on 9/19, a little over two weeks. If you think we can't get the repo created in the next couple of days let us know. Any later than that won't give us enough time to onboard a new repo to the bundle.

If that's the case then I suggest putting it in common-utils until the new repo can be created and everything moved without rushing.

@xiaoshi2013
Copy link

Great, I'm really looking forward to it.

@dblock
Copy link
Member

dblock commented Sep 5, 2023

The problem I worry about is the time it will take to get a new repo requested -> approved -> created. @hyandell any idea how long this will take? I know it will have to go through trademark checks and all that. We have feature release scheduled for 2.10 on 9/19, a little over two weeks. If you think we can't get the repo created in the next couple of days let us know. Any later than that won't give us enough time to onboard a new repo to the bundle.

I've seen empty repos created in ~24h.

@dblock
Copy link
Member

dblock commented Sep 5, 2023

It appears, per @dblock comment, that somewhere in the bundle assemble workflow all downstream OpenSearch (ODFE) developed plugins (external repos) are in fact installed by default. Looks like that's a relic of the original ODFE days. IMO that's a strange decision with interesting (and potentially harmful) side effects... but that's a separate conversation.

It has been like this since OpenSearch v1, and is not a relic. Most users want(ed) a fully featured OpenSearch with security enabled. This is done in the bundle assemble workflow.

@nknize
Copy link
Collaborator Author

nknize commented Sep 6, 2023

It has been like this since OpenSearch v1,....It has been like this since OpenSearch v1, and is not a relic...This is done in the bundle assemble workflow.

Markdowns don't tell me anything. Code does.

Most users want(ed) a fully featured OpenSearch with security enabled.

+1 for security by default. -1 to install every OpenSearch plugin known to man (or at least in opensearch-project). Some day security may make it's way into core. But many users do not want the saddlebags of the bundle plugins.

@reta
Copy link
Collaborator

reta commented Sep 6, 2023

@nknize @dblock @andrross since 2.10 is about to be released soon, we won't be doing 2.9.1 right? (to confirm)

@andrross
Copy link
Member

andrross commented Sep 6, 2023

@reta That's my understanding, we will not release 2.9.1

@andrross
Copy link
Member

andrross commented Sep 7, 2023

I believe changes are required in the distribution assembly logic to make the "install by default" thing actually happen. I have opened an issue here: opensearch-project/opensearch-build#3971

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working decision Issues requiring a decision help wanted Extra attention is needed Performance This is for any performance related enhancements or bugs Severity-Critical v2.10.0
Projects
None yet