Skip to content

fix!: disable auto-cleanup by default#6755

Merged
jackye1995 merged 1 commit into
lance-format:mainfrom
touch-of-grey:TurnOffAutoClean
May 13, 2026
Merged

fix!: disable auto-cleanup by default#6755
jackye1995 merged 1 commit into
lance-format:mainfrom
touch-of-grey:TurnOffAutoClean

Conversation

@touch-of-grey
Copy link
Copy Markdown
Contributor

Summary

  • Default WriteParams::auto_cleanup to None so new datasets do not opt into the periodic cleanup hook.
  • Aligns the Rust default with the Python binding (already defaults to None) and the Java binding (inherits the Rust default).
  • Adds a regression test asserting the new default and updates the cleanup-tests fixture so the auto-cleanup behavioural tests explicitly enable it.

Closes #6728

Why

With the old default, every 20th commit ran a full cleanup pass — listing _versions/, reading every manifest plus its index sidecar, and listing the data/, _transactions/, _indices/ subtrees — even when nothing was older than the 14-day default. A short S3 (us-east-1) benchmark of 60 sequential overwrites on a tiny dataset:

Group mean max
ON, v % 20 == 0 (hook fires) 2548 ms 2671 ms
ON, v % 20 != 0 1180 ms 1390 ms
OFF (skip_auto_cleanup=true) 1196 ms 1456 ms

The 14-day window meant the periodic spike paid the inspection cost and deleted nothing in practice. Users who want auto-cleanup can still opt in by setting WriteParams::auto_cleanup at create time or by setting lance.auto_cleanup.interval / lance.auto_cleanup.older_than via Dataset::update_config on an existing dataset.

The WriteParams default enabled auto-cleanup with interval=20 and
older_than=14d. On a busy writer every 20th commit ran a full cleanup
pass (listing + reading every manifest and the data/tx/index subtrees)
yet deleted nothing until a version was 14 days old, adding multi-second
per-commit latency on object stores that grows with version count.

Default WriteParams::auto_cleanup to None so new datasets opt in
explicitly. This also aligns the Rust default with the Python binding
(already None) and the Java binding (inherits the Rust default).

Closes lance-format#6728
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label May 13, 2026
@jackye1995 jackye1995 changed the title fix: disable auto-cleanup by default fix!: disable auto-cleanup by default May 13, 2026
@jackye1995
Copy link
Copy Markdown
Contributor

this should be a breaking change of behavior, I updated the title

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix, looks good to me

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/cleanup.rs 93.75% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@jackye1995 jackye1995 merged commit 1f358d1 into lance-format:main May 13, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: disable auto-cleanup by default — hook adds multi-second latency every 20 commits and usually deletes nothing

2 participants