Skip to content

add_columns should support Blob v2 external URIs and clean up failed writes #7075

@yyzhao2025

Description

@yyzhao2025

Problem

add_columns had two gaps compared with the main write path when working with Blob v2 datasets:

  1. It could not resolve Blob v2 external URIs during the update/write path.
  2. It could leave orphaned data files and Blob v2 sidecars behind when the operation failed after partial writes.

This made add_columns inconsistent with the expected Blob v2 behavior and could leave storage artifacts behind on failure.

Blob v2 external URI resolution

add_columns now opens its update writer with the same Blob v2 external base resolution needed by the normal write path, so Blob v2 reference values can resolve dataset-registered external URIs correctly.

Failed write cleanup

add_columns now cleans up files created by the current failed attempt, including:

  • unfinished data files from the current writer
  • completed but uncommitted fragment outputs
  • Blob v2 sidecar directories created for those files

The cleanup logic also preserves the intended safety boundaries:

  • do not delete external-base files
  • do not delete fragments already recovered from / owned by checkpoint state

Scope

This issue covers the add_columns path and its Blob v2 / cleanup behavior.

Follow-up

alter_columns is not included in this change set.

alter_columns now shares some of the same lower-level machinery, but its commit-failure cleanup path should be handled in a separate follow-up PR to keep scope focused and reviewable.

Notes

This work is intended to keep add_columns behavior aligned with Lance Blob v2 design expectations:

  • consistent URI resolution behavior across write paths
  • no orphaned internal files on failed operations
  • no accidental deletion of external user-managed data

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions