·
147 commits
to main
since this release
Immutable
release. Only release title and notes can be modified.
Version 3.7.0 Release Notes
Compatible with OpenSearch and OpenSearch Dashboards version 3.7.0
Features
- Add dynamic properties support for pattern-based field definitions without cluster state mapping updates (#20816)
- Add pluggable data format engine with DataFormatAwareEngine for multi-format indexing (#21181)
- Add Lucene engine implementation for pluggable data formats (#21299)
- Add merge support for Parquet data format plugin via streaming k-way merge sort (#21079)
- Add directory and IndexInput layers for WritableWarm tiered storage (#21178)
- Add server-side implementation for tiering status APIs (GetTieringStatus and ListTieringStatus) (#21220)
- Add server-side implementation for HotToWarm, WarmToHot, and CancelTiering APIs (#21295)
- Add prefetch settings and stored fields prefetch for WritableWarm tiered storage (#21285)
- Add slow logs, per-query metrics, and migration metrics for WritableWarm tiered storage (#21332)
- Add module wiring and integration tests for WritableWarm tiered storage (#21427)
- Add tiered object storage crate for warm node file routing (#21204)
- Add event-driven scheduler and stage execution for analytics engine (#21242)
- Add coordinator-side DataFusion reduce with streaming Arrow batches (#21356)
- Add distributed aggregation with partial/final mode for analytics engine (#21457)
- Add distributed join planning and execution for analytics engine (#21639)
- Add PPL
appendcommand support with multi-child stage runtime for Union (#21474) - Add PPL
dedupcommand support via ROW_NUMBER window function (#21622) - Add PPL
eventstatsandstreamstatswindow function support (#21734) - Add PPL
topandrarecommand support via window functions (#21593) - Add PPL
parsecommand with regex mode via Rust UDFs (#21573) - Add PPL
rexcommand with sed and extract modes (#21550) - Add PPL
spathcommand with auto-extract mode via json_extract_all UDF (#21664) - Add 7 PPL JSON scalar functions to analytics engine route (#21513)
- Add 23 PPL datetime scalar functions to analytics engine route (#21556)
- Add 14 additional PPL datetime functions (Wave A) including strftime, date_format, maketime (#21582)
- Add 30+ PPL math scalar functions to analytics engine (#21520)
- Add PPL string scalar functions to analytics engine (18 functions) (#21543)
- Add PPL conditional functions (coalesce, isempty, isblank, case, if, ifnull) to analytics engine (#21643)
- Add PPL conversion scalar functions (num, auto, memk, rmcomma, dur2sec, ctime, mktime) to analytics engine (#21628)
- Add PPL cryptographic functions (md5, sha1, sha2, crc32) to analytics engine (#21611)
- Add PPL array constructor and 8 multivalue functions to analytics engine (#21554)
- Add PPL bucketing scalars (span_bucket, width_bucket, minspan_bucket, range_bucket) (#21621)
- Add PPL TAKE, FIRST, LAST, LIST, VALUES aggregate functions (#21731)
- Add Lucene filter delegation from DataFusion for full-text search predicates (#21555)
- Add performance delegation to Lucene for selective filter predicates (#21701)
- Add native Arrow transport path with zero-copy transfer for stream transport (#21253)
- Stream Arrow batches on data-node fragment execution path (#21418)
- Add support for
extra_fieldsoutside_sourceindexing for improved vector ingestion throughput (#20635) - Add gRPC support for Min, Max, and Terms aggregations (#21205)
- Add partition strategy setting for flexible shard-to-partition mapping in pull-based ingestion (#21165)
- Add SplitToFieldsProcessor for distributing split values to target fields (#21216)
- Add native memory based admission control for transport request throttling (#21191)
- Add native memory search backpressure for off-heap query cancellation (#21647)
- Add unified native allocator framework for Arrow allocations with elastic rebalancing (#21703)
- Add on-demand jemalloc heap profiling support via JMX CLI tool (#21599)
- Add
search.max_bucketsto workload group settings for per-tenant bucket limits (#21721) - Add additional search settings and
override_request_valuesto workload management groups (#21523) - Add hunspell dictionary hot-reload support via
_refresh_search_analyzersAPI (#21559)
Enhancements
- Add adaptive query budget for DataFusion engine with bounded memory and improved throughput (#21695)
- Add DynamicLimitPool for runtime memory pool limit changes in DataFusion (#21286)
- Add configurable coordinator buffer limit for per-query Arrow allocator (#21726)
- Add CPU task cancellation for DataFusion queries (#21560)
- Add IO task cancellation support for DataFusion queries (#21531)
- Add DataFusion logical and physical plan logging at DEBUG level (#21646)
- Add dynamic settings for indexed query execution path (#21522)
- Add dedicated
analytics_schedulerthread pool to prevent coordinator deadlock (#21771) - Add dedicated
analytics_reducethread pool for coordinator reduce drains (#21800) - Add native memory stats and task cancellation stats to node stats API (#21637)
- Add
current_application_duration_msto cluster state download stats in node stats API (#20922) - Add segments and segment stats support for DataFormatAwareEngine (#21696)
- Add DataFormat-aware NRT replication engine and remote-store wiring (#21311)
- Add DataFormat-aware shallow snapshot v2 support (#21742)
- Add DataFormat-aware read-only engine for warm primaries with tiering service improvements (#21720)
- Add dynamic mapping support for pluggable data formats (#21444)
- Add delete execution engine abstraction for DataFormatAwareEngine (#21313)
- Add cluster-scope defaults for pluggable dataformat settings (#21435)
- Add indexing support for metadata fields in pluggable data formats (#21585)
- Add Lucene merge support for pluggable data format composite engine (#21422)
- Add composite merge handler and merge policy for data-format-aware engine (#21128)
- Add sort-on-refresh for composite engine with cross-format row-ID consistency (#21468)
- Add warm+format directory wiring with per-format tiered directory routing (#21361)
- Add block cache SPI and Foyer plugin for warm nodes (#21530)
- Add REST API paths for block cache prune and detailed file cache stats (#21705)
- Add cancellation checkpoints in field data loading and aggregation paths (#21318)
- Add
queryTimeoutto IndexSearcher for KNN vector search timeout enforcement (#21316) - Add index-level authorization to analytics engine via ActionFilter dispatch (#21789)
- Add
/_analytics/ppl/_explainendpoint with stage profiling (#21660) - Add relevance function support (match_phrase, multi_match, query_string, etc.) to analytics engine (#21562)
- Add relevance functions optional parameter support and new functions (wildcard_query, query, match_all) (#21661)
- Add filter pushdown rules and Calcite rule metrics for profiling (#21684)
- Add per-column encoding and compression configuration for Parquet data format (#21665)
- Avoid repeated encoding and compression for sort column writes in Parquet (#21464)
- Add pipeline execution metrics to PollingIngestStats for pull-based ingestion (#21024)
- Add batching for persistent task cluster service to reduce cluster manager load (#21245)
- Refactor BitsetFilterCache to node-level cache with configurable size limit (#21179)
- Skip zone awareness when
auto_expand_replicasis set to all (#21217) - Relax field-level meta validation constraints to allow any number of entries with string values (#20578)
- Deprecate boolean constructor of FetchSourceContext in favor of static constants (#21235)
- Add validation and deprecation warnings for ambiguous
_sourcefiltering (#21203) - Speed up Painless Script Engine initialization by ~10% (#21463)
- Fix accumulation of file sizes when multiple files share the same extension in segment stats (#21000)
- Improve native memory admission control precision with auto-derived budget and JVM non-heap subtraction (#21749)
- Tighten DataFusion memory guard with RSS-based hard guard to prevent OOM under concurrent load (#21814)
- Support
indices_boost_2array format for gRPC search (#21300) - Add configurable Kafka metadata timeout for pull-based ingestion (#21425)
- Expose tokio-metrics as DataFusion plugin stats (#21303)
- Add Lucene FFM callbacks to task resource tracking (#21610)
Bug Fixes
- Fix YAML parser corrupting string values that resemble booleans after Jackson 3.x migration (#21294)
- Fix
map_unmapped_fields_as_textlost after dynamic mapping update in PercolatorFieldMapper (#21301) - Fix O(n²) removeAll in remote translog metadata cleanup causing CPU spikes (#21350)
- Fix
Rounding.isUTC()to recognize UTC timezone aliases for date histogram optimization (#21221) - Fix NPE in QueryPhaseResultConsumer when all shards fail (#21158)
- Fix bulk request hang when index is deleted during primary phase (#21305)
- Fix deadlock between engineMutex and writeLock during index close and engine reset (#21404)
- Fix FlightOutboundHandler clearing caller's ThreadContext (#21167)
- Fix
IndicesRequestCacheCleanupITflakiness by removing too-short assertBusy timeouts (#21494) - Fix negative fielddata stats by guarding against stale removals after shard reallocation (#21667)
- Fix
half_floatingest writing wrong fp16 bit pattern in Parquet (#21783) - Fix StringView buffer bloat in DataFusion stream_next FFI export causing 435x data amplification (#21753)
- Fix Utf8View/Utf8 schema mismatch panic in indexed parquet path (#21826)
- Fix memory leak in
transport-reactor-netty4plugin with persistent connections (#21788) - Fix
ExitablePostingsEnumto extendFilterPostingsEnumfor proper delegation (#21558) - Fix local recovery from flush for DataFormatAwareEngine (#21553)
- Fix safe-commit info and replication checksum for DFA shards (#21787)
- Fix DFA recovery failures: file-handle leak and reset-path crash (#21759)
- Handle null scripted metric combine results (#21534)
- Demote "No resource usage stats available for node" log from WARN to DEBUG (#21638)
- Fix pull-based ingestion document mapper usage to reflect mapping updates (#21183)
- Fix pull-based ingestion consumer factory to be stateless and prevent race conditions (#21652)
- Fix pull-based ingestion multi-threaded writer batchStartPointer computation (#21697)
- Fix Netty4Http3ServerTransport to use configured HeaderVerifier and Decompressor instances (#21281)
- Convert varchar to str in analytics engine Project operations to fix DataFusion type errors (#21794)
- Fix
microsecond()function and add timestamp lower-bound validation in analytics engine (#21793) - Enforce write blocks for DFA hot-to-warm tiering to survive DiskThresholdMonitor removal (#21828)
Maintenance
- Bump Netty to 4.2.14.Final (#21772)
- Update Jackson to 2.21.3 / 3.1.3 (#21493)
- Update ASM to 9.10 (#21764)
- Update OpenTelemetry to 1.62.0 and SemConv to 1.41.0 (#21595)
- Update Project Reactor to 3.8.5 and Reactor Netty to 1.3.5 (#21226)
- Update bundled JDK to JDK 25.0.3 (#21353)
- Update log4j2 to 2.25.4 (#21416)
- Update
httpclient5to 5.6.1 (#21441) - Bump commons-configuration2 from 2.14.0 to 2.15.0 (#21806)
- Bump org.apache.commons:commons-configuration2 from 2.13.0 to 2.14.0 (#21213)
- Bump com.google.protobuf from 0.9.6 to 0.10.0 (#21291)
- Bump org.apache.hadoop:hadoop-minicluster from 3.4.2 to 3.5.0 (#21138)
- Bump org.codehaus.woodstox:stax2-api from 4.2.2 to 4.3.0 (#21137)
- Bump org.jline:jline from 4.0.0 to 4.0.14 (#21471)
- Bump org.jsoup:jsoup from 1.22.1 to 1.22.2 (#21290)
- Bump com.nimbusds:nimbus-jose-jwt from 10.8 to 10.9 (#21214)
- Remove Unsafe class injection from Java agent (#21542)
- Replace mimalloc with jemalloc as global allocator for native sandbox plugins (#21497)
- Upgrade DataFusion to v53 and Arrow to v58 (#21590)
- Pin GitHub Actions to commit SHAs for supply chain security (#21808)
- Update FIPS bootstrap check to use OpenSearch env var instead of BouncyCastle system property (#21415)