Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error writing Lance dataset #791

Open
changhiskhan opened this issue Apr 20, 2023 · 0 comments
Open

Error writing Lance dataset #791

changhiskhan opened this issue Apr 20, 2023 · 0 comments
Labels
bug Something isn't working rust Rust related tasks

Comments

@changhiskhan
Copy link
Contributor

first reported by @cemoody (on discord)

Lance version: v0.4.2, v0.4.3

When writing this dataset:

lance.write_dataset(combined.select(['id', 'vector']), lance_dir)

An error is raise:

thread '<unnamed>' panicked at 'assertion failed: (offset + length) <= self.len()', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-data-33.0.0/src/data.rs:543:9
*** pyo3_runtime.PanicException: assertion failed: (offset + length) <= self.len()

However, writing each column individually works fine:

(Pdb) lance.write_dataset(combined.select(['id']), lance_dir / '3')
<lance.dataset.LanceDataset object at 0x7f4b8c15ffa0>
(Pdb) lance.write_dataset(combined.select(['vector']), lance_dir / '4')
<lance.dataset.LanceDataset object at 0x7f4b8c15fe20>

Workaround:

First reading the underlying data via pandas, then converting that to PyArrow.

@changhiskhan changhiskhan added the bug Something isn't working label Apr 20, 2023
@eddyxu eddyxu self-assigned this Apr 20, 2023
@gsilvestrin gsilvestrin self-assigned this Apr 21, 2023
@changhiskhan changhiskhan added the rust Rust related tasks label Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rust Rust related tasks
Projects
None yet
Development

No branches or pull requests

3 participants