Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Extracting a nested list-host-scalar throws error #11670

Closed
galipremsagar opened this issue Sep 8, 2022 · 0 comments · Fixed by #11671
Closed

[BUG] Extracting a nested list-host-scalar throws error #11670

galipremsagar opened this issue Sep 8, 2022 · 0 comments · Fixed by #11671
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Describe the bug
When a nested list column contains a struct column and we try to extract the host-scalar it results in an error.

Steps/Code to reproduce bug

In [1]: import cudf

In [2]: s = cudf.Series([[[{'a':1, 'b':2, 'c':10}]]])

In [3]: s[0]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [3], line 1
----> 1 s[0]

File /nvme/0/pgali/envs/cudfdev/lib/python3.9/contextlib.py:79, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     76 @wraps(func)
     77 def inner(*args, **kwds):
     78     with self._recreate_cm():
---> 79         return func(*args, **kwds)

File /nvme/0/pgali/envs/cudfdev/lib/python3.9/site-packages/cudf/core/series.py:1167, in Series.__getitem__(self, arg)
   1165     return self.iloc[arg]
   1166 else:
-> 1167     return self.loc[arg]

File /nvme/0/pgali/envs/cudfdev/lib/python3.9/contextlib.py:79, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     76 @wraps(func)
     77 def inner(*args, **kwds):
     78     with self._recreate_cm():
---> 79         return func(*args, **kwds)

File /nvme/0/pgali/envs/cudfdev/lib/python3.9/site-packages/cudf/core/series.py:258, in _SeriesLocIndexer.__getitem__(self, arg)
    255 except (TypeError, KeyError, IndexError, ValueError):
    256     raise KeyError(arg)
--> 258 return self._frame.iloc[arg]

File /nvme/0/pgali/envs/cudfdev/lib/python3.9/contextlib.py:79, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
     76 @wraps(func)
     77 def inner(*args, **kwds):
     78     with self._recreate_cm():
---> 79         return func(*args, **kwds)

File /nvme/0/pgali/envs/cudfdev/lib/python3.9/site-packages/cudf/core/series.py:180, in _SeriesIlocIndexer.__getitem__(self, arg)
    178 if isinstance(arg, tuple):
    179     arg = list(arg)
--> 180 data = self._frame._get_elements_from_column(arg)
    182 if (
    183     isinstance(data, (dict, list))
    184     or _is_scalar_or_zero_d_array(data)
    185     or _is_null_host_scalar(data)
    186 ):
    187     return data

File /nvme/0/pgali/envs/cudfdev/lib/python3.9/site-packages/cudf/core/single_column_frame.py:385, in SingleColumnFrame._get_elements_from_column(self, arg)
    379 def _get_elements_from_column(self, arg) -> Union[ScalarLike, ColumnBase]:
    380     # A generic method for getting elements from a column that supports a
    381     # wide range of different inputs. This method should only used where
    382     # _absolutely_ necessary, since in almost all cases a more specific
    383     # method can be used e.g. element_indexing or slice.
    384     if _is_scalar_or_zero_d_array(arg):
--> 385         return self._column.element_indexing(int(arg))
    386     elif isinstance(arg, slice):
    387         start, stop, stride = arg.indices(len(self))

File /nvme/0/pgali/envs/cudfdev/lib/python3.9/site-packages/cudf/core/column/column.py:445, in ColumnBase.element_indexing(self, index)
    442 if idx > len(self) - 1 or idx < 0:
    443     raise IndexError("single positional indexer is out-of-bounds")
--> 445 return libcudf.copying.get_element(self, idx).value

File scalar.pyx:174, in cudf._lib.scalar.DeviceScalar.value.__get__()

File scalar.pyx:146, in cudf._lib.scalar.DeviceScalar._to_host_scalar()

File scalar.pyx:431, in cudf._lib.scalar._get_py_list_from_list()

File interop.pyx:144, in cudf._lib.interop.to_arrow()

RuntimeError: cuDF failure at: /nvme/0/pgali/cudf/cpp/src/interop/to_arrow.cu:280: Number of field names and number of children doesn't match

Expected behavior

In [4]: s[0]
Out[4]: 
[[{'a': 1, 'b': 2, 'c': 10}]]

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Sep 8, 2022
@galipremsagar galipremsagar self-assigned this Sep 8, 2022
@galipremsagar galipremsagar added this to Issue-Needs prioritizing in v22.10 Release via automation Sep 8, 2022
@galipremsagar galipremsagar changed the title [BUG] Extracting a list-host-scalar throws error [BUG] Extracting a nested list-host-scalar throws error Sep 8, 2022
@galipremsagar galipremsagar moved this from Issue-Needs prioritizing to Issue-P1 in v22.10 Release Sep 8, 2022
rapids-bot bot pushed a commit that referenced this issue Sep 12, 2022
…1671)

This PR:
Fixes: #11670 
- [x] Fixes: #11670, by correctly generating the `column_metadata` for nested scenarios.
- [x] Also fixes an issue with dtype mismatch after updating `children` in a `ListColumn`. See the pytest below.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #11671
v22.10 Release automation moved this from Issue-P1 to Done Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

1 participant