v7.0.0-rc.1
Pre-release
Pre-release
·
214 commits
to main
since this release
What's Changed
Breaking Changes 🛠
- feat!: make dataset object store access base-aware by @jackye1995 in #6647
- perf!: run scheduler initialize eagerly in async read_tasks by @westonpace in #6710
- fix!: disable auto-cleanup by default by @touch-of-grey in #6755
New Features 🎉
- feat: implement vector index details by @wjones127 in #6099
- feat(lsm): expose lsm api to python by @zhangyue19921010 in #6259
- feat: add branch/tag metadata maps and tag timestamps by @majin1102 in #6364
- feat: route partial-schema merge_insert through the v2 write path by @wombatu-kun in #6472
- feat: route find-or-create merge_insert through the v2 write path by @wombatu-kun in #6473
- feat: expose base scoped store bindings to java by @zhangyue19921010 in #6548
- feat: add unenforced_clustering_key to format spec by @beinan in #6552
- feat(spec): formally split catalog, namespace, table and index specs by @jackye1995 in #6566
- feat: support distributed bitmap index build by @zhangyue19921010 in #6598
- feat: support segmented btree indices by @beinan in #6605
- feat: expose FTS exec internals to enable distributed planning by @vivek-bharathan in #6648
- feat: opt-in env var to drop centroids from vector index stats by @westonpace in #6654
- feat: add write ahead log appender and tailer primitives by @touch-of-grey in #6669
- feat: unify WalAppender into ShardWriter via enable_memtable mode by @touch-of-grey in #6675
- feat: show scalar index type in ScalarIndexQuery plan output by @wkalt in #6677
- feat(namespace): add backfill and refresh operations by @jackye1995 in #6678
- feat(fts): implement CacheCodec for posting lists and positions by @wkalt in #6691
- feat(java): expose scan execution stats by @fangbo in #6712
- feat(mem_wal): flush active memtable on ShardWriter::close by @touch-of-grey in #6717
- feat(index): merge incremental inverted index segments by @BubbleCal in #6737
- feat: allow LSM scanner to read fresh tier without a base table by @hamersaw in #6749
- feat: add manifest version hint for fast latest-version lookup by @touch-of-grey in #6752
- feat(io): add shared-memory:// scheme for cross-component in-memory state by @hamersaw in #6753
- feat(mem_wal): add ShardWriter manual compaction APIs by @hamersaw in #6766
- feat(java): add Dataset.takeRows() for physical row ID access by @alex766 in #6772
- feat: expose uncommitted delete transactions by @ragnorc in #6781
- feat: support FTS index segment merging by @Xuanwo in #6790
- feat(python): support nested blob fields in to_pandas by @geruh in #6791
- feat(index): serializable cache for the BTree scalar index by @wjones127 in #6793
- feat: add Lance-native MemWAL HNSW for shard writer by @touch-of-grey in #6795
- feat(io): make shared-memory:// provider available outside tests by @hamersaw in #6805
- fix: reject changing the unenforced primary key once set by @touch-of-grey in #6810
- feat: support updating the unenforced clustering key via UpdateConfig by @touch-of-grey in #6812
- feat: builder-style MemWAL initialization API by @touch-of-grey in #6815
- feat(mem_wal): cache opened L0 flushed-generation datasets by @hamersaw in #6816
- feat(java): add Java bindings for MemWAL APIs by @touch-of-grey in #6833
- feat(rust): support update by _row_id by @yanghua in #6837
- feat: expose granular trace event targets by @beinan in #6853
- feat: add MemWAL sharding evaluator by @jackye1995 in #6854
- feat: expose multi-base config to Python and Java write_fragments API by @zhangyue19921010 in #6855
- feat(index): serializable cache for Bitmap and LabelList scalar indices by @wjones127 in #6874
- feat: create materialized view API by @rpgreen in #6891
Bug Fixes 🐛
- fix: set scope for junit-jupiter dependency to test by @yuqi1129 in #5576
- fix: branch_identfier unstable for legacy branches by @majin1102 in #6390
- fix: serialize version metadata through JNI and correct row-ID lookup by @ivscheianu in #6465
- fix(blobV2): blob.from_uri optional validation by @zhangyue19921010 in #6558
- fix: upgrade ethnum 1.5.2 to 1.5.3 and unpin nightly toolchain by @LuciferYang in #6578
- fix: drop stale has_all_zeros guard so block bitpack engages on rep/def by @pengw0048 in #6629
- fix(json): fix json object value fetch by integer by @dentiny in #6632
- fix(java-jni): make dispatcher thread a daemon to allow jvm to exit by @XuQianJin-Stars in #6633
- fix(core): map DataFusion SchemaError to InvalidInput by @jja725 in #6639
- fix: refresh last_updated metadata on Operation::Merge for rewritten fragments by @jerryjch in #6640
- fix: make vector distance schema nullable by @BubbleCal in #6649
- fix: propagate update_columns offsets and partial last_updated for RewriteColumns by @jerryjch in #6650
- fix: json extract using wildcard by @zhangyue19921010 in #6651
- fix(optimize): fix stale OCC read_version in distributed compaction to prevent row resurrection by @xiaguanglei in #6653
- fix: add more info when describe indices error by @wojiaodoubao in #6665
- fix: detect table metadata key conflicts in concurrent UpdateConfig by @jackye1995 in #6667
- fix: always do wal tailer cursor update by @touch-of-grey in #6673
- fix: add equals/hashCode to FullTextQuery concrete subclasses by @ivscheianu in #6674
- fix(core): stop scalar-index training scans from over-projecting Map siblings by @kushudai in #6679
- fix: swap dataset.rs warning argument order by @wkalt in #6683
- fix(python): add backfill/refresh wrappers to namespace classes by @jackye1995 in #6684
- fix: resolve system index types in describe_indices by @wkalt in #6685
- fix: avoid nested tokio runtime panic in AWS credential vendor by @jackye1995 in #6689
- fix: simplify column-map expressions before physical planning by @westonpace in #6698
- fix(java): honor user-specified distance type in VectorTrainer by @jiaoew1991 in #6704
- fix: sync unenforced_primary_key_position when field metadata updates by @touch-of-grey in #6706
- fix: do no work when optimize_indices is called but no new data has arrived by @westonpace in #6711
- fix(mem_wal): keep dispatcher and waiters alive on flush handler errors by @touch-of-grey in #6715
- fix(encoding): support sparse boolean lists in full-zip encoding by @shenganzhang in #6723
- fix(index): log vector training sampling progress by @hfutatzhanghb in #6724
- fix: scalar index config plugin lookup to be case-insensitive by @zhangyue19921010 in #6739
- fix(rust): avoid pushdown limit/offset for stable row id by @yanghua in #6740
- fix: use physical scan stream for update by @wojiaodoubao in #6741
- fix: use multipart-aware put for transaction file writes by @jackye1995 in #6750
- fix(mem_wal): advance visibility cursor on WAL durability, not indexing by @hamersaw in #6754
- fix: include view tag in FieldDataCacheKey by @wombatu-kun in #6758
- fix(mem_wal): return _distance from LSM vector search across mixed sources by @hamersaw in #6761
- fix(mem_wal): stop WAL replay from re-loading already-compacted entries by @hamersaw in #6767
- fix: complex all-null list struct decoding by @Xuanwo in #6771
- fix: set _row_created_at_version to new version for MERGE INTO INSERT rows by @jerryjch in #6774
- docs: change a typo of WAL entry filename suffix to '.arrow' by @jiengup in #6786
- fix(index): fix range query bound inclusive-ness by @dentiny in #6796
- fix: stop double-counting child CPU in node-with-children Exec plans by @brendanclement in #6799
- fix: decode constant list struct children by @Xuanwo in #6801
- fix(mem_wal): keep sealed memtables queryable until flush commits by @hamersaw in #6814
- fix(encoding): fail cleanly when blob data fails to load by @nsLance in #6817
- fix: make HNSW graph build deterministic to stabilize test_ann_prefilter by @wombatu-kun in #6818
- fix(python): pass batch size through fragment pandas export by @BubbleCal in #6829
- fix(mem_wal): allow append-only tables without primary keys by @touch-of-grey in #6848
- fix(mem_wal): dedupe duplicate primary keys in LSM point lookup by @touch-of-grey in #6880
- fix(mem_wal): exact PK dedup for LSM vector search by @touch-of-grey in #6881
Documentation 📚
- docs: update dataset source in llm_dataset_creation.md by @lhoestq in #5827
- docs(mem_wal): warn about stalled-writer race when WAL GC is added by @hamersaw in #6699
- docs: remove FROM community clause from DuckDB lance install command by @SiddiqueAhmad in #6727
- docs: correct the suffix of wal file and unify the term "shard_id" by @jiengup in #6807
- docs: document PR publishing requirements by @Xuanwo in #6870
Performance Improvements 🚀
- perf: avoid materializing RoaringBitmap::full() in fragment allow-list by @wkalt in #6664
- perf: reduce memory use when splitting IVF partitions by @wjones127 in #6687
- perf(index): use SIMD-dispatched l2_u8 in SQDistCalculator by @martji in #6692
- perf: use hnsw for memtable vector index by @touch-of-grey in #6701
- perf: reduce IO requests from loading bitmap index by @wjones127 in #6703
- perf: revert inline scheduling by @westonpace in #6709
- perf: fix O(N·K) slow row-id lookup on stable-row-id datasets by @hamersaw in #6716
Other Changes
- refactor(mem_wal): redesign FTS mem index for single-writer multi-reader by @touch-of-grey in #6726
- refactor: rename ShardSpec to ShardingSpec by @touch-of-grey in #6813
Full Changelog: release-root/7.0.0-beta.N...v7.0.0-rc.1