-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdataRay Data-related issuesRay Data-related issuesstaleThe issue is stale. It will be closed within 7 days unless there are further conversationThe issue is stale. It will be closed within 7 days unless there are further conversation
Description
Ray 2.2 on MacOS, pyarrow==6.0.1, python 3.7
stacktrace:
File "python/ray/_raylet.pyx", line 830, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 834, in ray._raylet.execute_task
File "/Users/cade/miniconda3/envs/anyscale/lib/python3.7/site-packages/ray/data/grouped_dataset.py", line 58, in map
parts = [BlockAccessor.for_block(p).combine(key, aggs) for p in partitions]
File "/Users/cade/miniconda3/envs/anyscale/lib/python3.7/site-packages/ray/data/grouped_dataset.py", line 58, in <listcomp>
parts = [BlockAccessor.for_block(p).combine(key, aggs) for p in partitions]
File "/Users/cade/miniconda3/envs/anyscale/lib/python3.7/site-packages/ray/data/_internal/arrow_block.py", line 515, in combine
return builder.build()
File "/Users/cade/miniconda3/envs/anyscale/lib/python3.7/site-packages/ray/data/_internal/table_block.py", line 98, in build
tables = [self._table_from_pydict(self._columns)]
File "/Users/cade/miniconda3/envs/anyscale/lib/python3.7/site-packages/ray/data/_internal/arrow_block.py", line 123, in _table_from_pydict
return pyarrow.Table.from_pydict(columns)
File "pyarrow/table.pxi", line 1724, in pyarrow.lib.Table.from_pydict
File "pyarrow/table.pxi", line 2368, in pyarrow.lib._from_pydict
File "pyarrow/array.pxi", line 341, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 315, in pyarrow.lib.array
File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert {'name': 'C', 'amount': 10, 'country': 'C1'} with type ArrowRow: did not recognize Python value type when inferring an Arrow data type
repro script:
#!/usr/bin/env python3
import ray
from ray.data.aggregate import AggregateFn
data = [
{'name': 'A', 'amount': 100, 'country': 'C1'},
{'name': 'B', 'amount': 200, 'country': 'C2'},
{'name': 'C', 'amount': 10, 'country': 'C1'},
{'name': 'D', 'amount': 500, 'country': 'C2'},
{'name': 'E', 'amount': 400, 'country': 'C3'},
]
ds = ray.data.from_items(data)
ds = ds.groupby('country')
result = ds.aggregate(AggregateFn(
init=lambda k: [],
accumulate_row=lambda a, r: a + [r],
merge=lambda a1, a2: a1['amount'] + a2['amount'],
finalize=lambda a: a
))full log https://gist.github.com/cadedaniel/1080563aae30309aef98505aef9fc6bc
pip freeze https://gist.github.com/cadedaniel/480a95d8d29da7795ebd19f092253b44
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdataRay Data-related issuesRay Data-related issuesstaleThe issue is stale. It will be closed within 7 days unless there are further conversationThe issue is stale. It will be closed within 7 days unless there are further conversation