Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Arrow does not support FieldRef to list of structs #60

Closed
eddyxu opened this issue Jul 28, 2022 · 2 comments
Closed

Apache Arrow does not support FieldRef to list of structs #60

eddyxu opened this issue Jul 28, 2022 · 2 comments
Labels
arrow Apache Arrow related issues c++ C++ issues help wanted Extra attention is needed python

Comments

@eddyxu
Copy link
Contributor

eddyxu commented Jul 28, 2022

Problem Statement

Apache Arrow does not support field reference to a list<struct>

import duckdb

ds = lance.dataset("./coco.lance").scanner(columns=["id", "annotations.label"])

Error:

Traceback (most recent call last):
  File "/Users/lei/work/lance/./query.py", line 6, in <module>
    ds = lance.dataset("./coco.lance").scanner(columns=["id", "annotations.label"])
  File "pyarrow/_dataset.pyx", line 271, in pyarrow._dataset.Dataset.scanner
  File "pyarrow/_dataset.pyx", line 2328, in pyarrow._dataset.Scanner.from_dataset
  File "pyarrow/_dataset.pyx", line 2174, in pyarrow._dataset._populate_builder
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(annotations.label) in id: int64
width: int64
height: int64
file_name: string
image: struct<data: binary>
annotations: list<item: struct<area: double, box: struct<xmax: double, xmin: double, ymax: double, ymin: double>, label: string, label_id: int64, segmentation: struct<height: int64, polygon: list<item: list<item: double>>, rle: list<item: int64>, type: int64, width: int64>, supercategory: string>>
__index_level_0__: int64
__fragment_index: int32
__batch_index: int32
__last_in_fragment: bool
__filename: string

Expected Behavior

Using annotations.label should returns values with type list<struct<label: str>> , a subset view of the original annotations list<struct>

@eddyxu eddyxu added help wanted Extra attention is needed python c++ C++ issues arrow Apache Arrow related issues labels Jul 28, 2022
@eddyxu
Copy link
Contributor Author

eddyxu commented Aug 26, 2022

Reported in Arrow's JIRA https://issues.apache.org/jira/browse/ARROW-17540

@changhiskhan
Copy link
Contributor

@eddyxu we control this now don't we? i think we could make list-of-struct a lot easier to work with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Apache Arrow related issues c++ C++ issues help wanted Extra attention is needed python
Projects
None yet
Development

No branches or pull requests

2 participants