feat(delete): distributed fragment-scoped delete + execute_batch delete support#7
Open
sezruby wants to merge 2 commits into
Open
feat(delete): distributed fragment-scoped delete + execute_batch delete support#7sezruby wants to merge 2 commits into
sezruby wants to merge 2 commits into
Conversation
…te support Add a distributed / parallel delete: partition the target by fragment, delete a predicate's matches per fragment slice independently, and batch-commit the per-slice transactions as one atomic commit. This completes the documented `execute_batch` API (previously append-only) for delete transactions. Two independent pieces: - Producer: `DeleteBuilder::with_target_fragments(ids)` scopes the scan (via `Scanner::with_fragments`) and deletion application to a fragment subset, so each distributed task reads only its own slice of the target instead of re-scanning the whole dataset. Unknown fragment ids are rejected. - Combiner: `combine_delete_transactions` unions the per-slice deletion vectors (one merged deletion file per shared fragment, data files untouched), dedups whole-fragment deletions, and requires a shared predicate + read_version. `CommitBuilder::execute_batch` now accepts an all-delete batch and combines it; mixed-kind batches are still rejected. Because disjoint fragment slices can never tombstone the same physical row, the batch commit is conflict-free and equivalent to a single full delete of the same predicate over the whole dataset. An overlap check backstops the invariant. Tests cover equivalence-vs-full-delete, whole-fragment deletion, empty slices, unknown-id rejection, overlap/predicate-mismatch rejection, mixed-batch rejection, and scalar-index consistency; all multi-fragment. Adds `benches/fragment_scoped_delete.rs` (aggregate-work-across-N-tasks framing; ~6x on a 64-fragment / 8-task single-node round). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, bench
Applies the code-review findings on the distributed delete:
- combine_delete_transactions now returns a CombinedDelete { transaction,
affected_rows, num_deleted_rows }. execute_batch_delete carries the
reconstructed affected_rows into the commit so a concurrent writer is rebased
at row granularity, matching a single DeleteBuilder::execute (previously the
batch committed with affected_rows=None and hard-failed under concurrency).
affected_rows is the delete delta vs read_version, computed by diffing each
touched fragment's post-delete deletion vector against its state at
read_version; its cardinality gives the aggregate num_deleted_rows.
- Since with_target_fragments requires disjoint slices, the combiner no longer
reads/unions deletion vectors or writes new deletion files. It concatenates
updated_fragments + deleted_fragment_ids and rejects any fragment id seen in
more than one transaction — covering whole-fragment overlaps too, which the
previous union-only check silently deduped.
- with_target_fragments rejects duplicate ids (previously double-counted rows
and emitted duplicate deleted_fragment_ids on the delete-everything path).
- execute_batch wraps the whole call (including the delete path's dataset
resolution + deletion-file reads) in the commit timeout; the arms call
execute_inner to avoid double-wrapping. Zero-timeout rejected up front.
- Bench: fresh dataset per iteration via iter_batched so uncommitted deletion
files don't accumulate in the store and bias full-scan (which writes N× more)
vs scoped. Corrected single-node round: ~5.8x.
- Python commit_batch docstring updated (append OR delete, not append-only).
- Tests: overlapping-whole-fragment rejection, duplicate-id rejection,
concurrent-delete rebase, num_deleted_rows/affected_rows equivalence vs a full
delete, scalar-index consistency via index_statistics, delete-path timeout;
negative tests now assert the error message, not just the InvalidInput variant.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Owner
Author
Review fixes applied (commit
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a distributed / parallel delete to lance: partition the target by fragment, delete a predicate's matches per fragment slice independently (in parallel), and batch-commit the per-slice transactions as one atomic commit. Completes the documented
CommitBuilder::execute_batchAPI — previously append-only, with an in-code TODO — for delete transactions.Tracking issue: #8 · sibling of the fragment-scoped merge work #6.
Two independent pieces
Producer —
DeleteBuilder::with_target_fragments(ids)Scanner::with_fragments) and deletion application to a fragment subset, so each task reads only its own slice instead of re-scanning the whole dataset (O(N × target)→O(target)aggregate I/O).apply_deletionsnaturally stays in-slice. Only thepredicate == truefast-path is explicitly scoped.Combiner —
combine_delete_transactions+execute_batchDelete armwith_target_fragmentsrequires disjoint slices, each fragment is touched by exactly one task — so the combiner never reads or rewrites deletion files. It concatenatesupdated_fragments+deleted_fragment_ids, requires all inputs to share the samepredicate+read_version, and rejects any fragment id seen in more than one transaction (covers whole-fragment overlaps, not just partial updates).CombinedDelete { transaction, affected_rows, num_deleted_rows }.execute_batchcarries the reconstructedaffected_rowsinto the commit, so a concurrent writer is rebased at row granularity — exactly like a singleDeleteBuilder::execute.affected_rowsis the delete delta vsread_version(each touched fragment's post-delete deletion vector minus its state atread_version, or all live rows for a whole-fragment delete); its cardinality is the aggregatenum_deleted_rows.execute_batchnow accepts an all-delete batch; all-append keeps its existing fast path; mixed / other-kind batches are rejected. The configuredwith_timeoutbounds the entire call, including the delete path's dataset resolution + deletion-file reads.Correctness
Disjoint fragment slices can never tombstone the same physical row ⇒ every matching row is covered exactly once ⇒ the combined result is exactly equal to a single full delete of the same predicate over the whole dataset. The disjointness check backstops the invariant and errors if a caller ever passes overlapping slices.
Multi-base-safe: the
affected_rowsreconstruction reads via the base-awareread_dataset_deletion_fileand writes no deletion files, verified against the recently-merged upstream multi-base PRs (lance-format/lance lance-format#7610, lance-format#7609, lance-format#7608, lance-format#4580).Tests (all multi-fragment)
test_fragment_scoped_delete_equals_full_delete— split delete ≡ full delete (identical surviving rows, and combinernum_deleted_rows+affected_rowsmatch a single full delete); each slice's txn touches only its own fragments.test_fragment_scoped_delete_deletes_whole_fragment— whole-fragment delete lands indeleted_fragment_ids.test_fragment_scoped_delete_empty_slice— a slice matching nothing contributes an empty delete.test_batch_delete_rebases_against_concurrent_delete— batch delete rebases against a concurrent delete of a different fragment (provesaffected_rowsis carried).test_fragment_scoped_delete_with_scalar_index— scalar-index consistency asserted viaindex_statistics(not just a full-scan-fallback count).Benchmark
benches/fragment_scoped_delete.rs(full_scan_per_taskvsfragment_scoped_per_task, framed as aggregate work across N tasks; fresh dataset per iteration so uncommitted deletion files don't accumulate and bias the modes). A 64-fragment / 8-task single-node round: ~5.8× (≈84.5 ms → ≈14.6 ms).Verification
cargo fmt --all✅cargo clippy -p lance --tests --benches -- -D warnings→ 0 warnings ✅cargo test -p lance --lib dataset::write::→ 268 passed, 0 failed ✅with_target_fragmentsexample) → pass ✅Commits
feat(delete): distributed fragment-scoped delete + execute_batch delete support— initial implementation.fix(delete): address review — row-level rebase, disjointness, timeout, bench— applies an xhigh code review: row-level rebase via reconstructedaffected_rows, disjointness-based combiner (no DV union/rewrite), duplicate-id rejection, whole-batch timeout, unbiased bench, stronger tests, Python docstring.🤖 Generated with Claude Code