Description
A Dataset handle at an older version (via checkout_version) has no way to signal cleanup_old_versions that it's in use. If cleanup runs while a reader is mid-scan, the manifest and data files for that version can be deleted out from under it, and the scan fails with object-store NotFound errors.
Reproduce
- Open a dataset and
checkout_version(N) where N is older than lance.auto_cleanup.older_than.
- Start a scan. While the scan is running, run
cleanup_old_versions with a before_version that includes N (or let auto-cleanup fire on the next commit).
- The scan fails mid-stream with
NotFound from the object store.
Current workarounds
Create a temporary tag before the read and delete it after. Works, but every reader has to cooperate and it gets awkward on auto-cleanup-per-commit deployments — cleanup runs on every write.
What I'd want
A short-lived, renewable signal that a reader can publish for the version it's using: cleanup treats the version as retained while the signal is live, and a crashed reader doesn't pin the version forever. Something like an advisory lease with a TTL.
Happy to put up a PR if this direction sounds reasonable.
Description
A
Datasethandle at an older version (viacheckout_version) has no way to signalcleanup_old_versionsthat it's in use. If cleanup runs while a reader is mid-scan, the manifest and data files for that version can be deleted out from under it, and the scan fails with object-storeNotFounderrors.Reproduce
checkout_version(N)where N is older thanlance.auto_cleanup.older_than.cleanup_old_versionswith abefore_versionthat includes N (or let auto-cleanup fire on the next commit).NotFoundfrom the object store.Current workarounds
Create a temporary tag before the read and delete it after. Works, but every reader has to cooperate and it gets awkward on auto-cleanup-per-commit deployments — cleanup runs on every write.
What I'd want
A short-lived, renewable signal that a reader can publish for the version it's using: cleanup treats the version as retained while the signal is live, and a crashed reader doesn't pin the version forever. Something like an advisory lease with a TTL.
Happy to put up a PR if this direction sounds reasonable.