-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Currently, arrow misses the support of pyarrow.compute.replace_with_mask for struct arrays:
apache/arrow#29558
That's why we have our own implementation used by NestedExtenstionArray.__setitem__(). The implementation has an overhead of creating a len(self)-sized struct array to perform the replacement. This approach would work well when we are going to replace many elements, but when we replacing just few, it would produce a large memory foot-print and probably take a while.
An alternative approach would be copying the original array to np.ndarray[pa.StructScalar], replace the elements in-place, and convert it back:
def replace_with_mask(array: pa.ChunkedArray, mask: pa.BooleanArray, value: pa.Array) -> pa.ChunkedArray:
"""Replace the elements of the array with the value where the mask is True"""
np_array = np.fromiter(array, dtype=object)
np_array[mask] = value
new_pa_array = pa.array(np_array)
return pa.chunked_array([new_pa_array])We should create a benchmark and see what works faster and have smaller memory foot-print.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working