fix!: disable auto-cleanup by default#6755
Merged
Merged
Conversation
The WriteParams default enabled auto-cleanup with interval=20 and older_than=14d. On a busy writer every 20th commit ran a full cleanup pass (listing + reading every manifest and the data/tx/index subtrees) yet deleted nothing until a version was 14 days old, adding multi-second per-commit latency on object stores that grows with version count. Default WriteParams::auto_cleanup to None so new datasets opt in explicitly. This also aligns the Rust default with the Python binding (already None) and the Java binding (inherits the Rust default). Closes lance-format#6728
Contributor
|
this should be a breaking change of behavior, I updated the title |
jackye1995
approved these changes
May 13, 2026
Contributor
jackye1995
left a comment
There was a problem hiding this comment.
thanks for the fix, looks good to me
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WriteParams::auto_cleanuptoNoneso new datasets do not opt into the periodic cleanup hook.None) and the Java binding (inherits the Rust default).Closes #6728
Why
With the old default, every 20th commit ran a full cleanup pass — listing
_versions/, reading every manifest plus its index sidecar, and listing thedata/,_transactions/,_indices/subtrees — even when nothing was older than the 14-day default. A short S3 (us-east-1) benchmark of 60 sequential overwrites on a tiny dataset:skip_auto_cleanup=true)The 14-day window meant the periodic spike paid the inspection cost and deleted nothing in practice. Users who want auto-cleanup can still opt in by setting
WriteParams::auto_cleanupat create time or by settinglance.auto_cleanup.interval/lance.auto_cleanup.older_thanviaDataset::update_configon an existing dataset.