Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support version metadata when writing a dataset #514

Open
changhiskhan opened this issue Feb 2, 2023 · 0 comments
Open

Support version metadata when writing a dataset #514

changhiskhan opened this issue Feb 2, 2023 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed python rust Rust related tasks

Comments

@changhiskhan
Copy link
Contributor

changhiskhan commented Feb 2, 2023

Motivation

Version metadata can be used for debugging and lineage tracking

Proto definition

  1. Deprecate

    lance/protos/table.proto

    Lines 134 to 139 in cb092db

    // Auxiliary Data attached to a version.
    // Only load on-demand.
    message VersionAuxData {
    // key-value metadata.
    map<string, bytes> metadata = 3;
    }
  2. Make it an optional field instead

Write

  1. Add Rust struct for VersionAuxData (https://github.com/eto-ai/lance/blob/main/protos/format.proto#L79) along with pb<>rust conversion traits
  2. Write the aux data section similar to how index metadata is written (https://github.com/eto-ai/lance/blob/main/rust/src/io/writer.rs#L79). write_manifest should take additional version metadata as parameter
    note make sure the version_aux_data file position is tested

Read

  1. Add fn to read version aux data in the Version struct (https://github.com/eto-ai/lance/blob/main/rust/src/dataset.rs#L66)
  2. Instead of returning a PyDict for each version (https://github.com/eto-ai/lance/blob/main/python/src/dataset.rs#L182), create a pyo3 struct Version that is exposed to python. This struct should have a function to read the aux data and cache it in memory.
@changhiskhan changhiskhan changed the title write_dataset should support version metadata parameter Support version metadata when writing a dataset Feb 15, 2023
@changhiskhan changhiskhan added good first issue Good for newcomers help wanted Extra attention is needed python rust Rust related tasks enhancement New feature or request and removed good first issue Good for newcomers labels Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed python rust Rust related tasks
Projects
None yet
Development

No branches or pull requests

1 participant