Skip to content

Add Puffin writer with deletion-vector-v1 blob support #7

@manuzhang

Description

@manuzhang

Puffin reading is supported (src/iceberg/puffin/file_metadata.{h,cc}, puffin_format.{h,cc}, with kDeletionVectorV1 = \"deletion-vector-v1\" constant). The manifest entry already carries content_offset / content_size_in_bytes / referenced_data_file for V3 deletion vectors, and Snapshot::IsDeletionVector() branches exist.

What's missing: a PuffinWriter (or equivalent) that can serialize a Roaring bitmap as a deletion-vector-v1 blob, write a Puffin file, and return the offsets needed to populate DataFile { content = DELETION_VECTORS, content_offset, content_size_in_bytes, referenced_data_file }.

src/iceberg/data/position_delete_writer.{h,cc} writes the older Parquet-based position-delete format, not Puffin DVs.

API sketch

class PuffinWriter {
  Status WriteDeletionVector(const std::string& referenced_data_file,
                             const RoaringBitmap& bitmap);
  Result<PuffinFooter> Finish();
};

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions