-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Nested types support #2857
Comments
I've changed the title since "compound" has a specific semantic meaning within libcudf++. Compound types refer to any type that has children, e.g., strings, dictionaries, nested, etc. |
I cannot stress enough how I would love to see this... |
I would like to add that Spark has native support for maps. There has been some confusion in the Arrow documentation about maps, but generally they are represented as a List of Key, Value structs. |
This would also be useful for us for a number of our use cases, including cyBERT post-processing where we have to remove overlapping columns between rows (created as an artifact of the training/inference phase). |
Would love to have this feature. |
This will be invaluable for us as we use lists as elements in pandas dataframes alot, and would love to switch to cudf! |
Going to close this as libcudf now has both struct and list types. Support is not complete across all functions, but individual issues can be filed if specific functionality is missing. |
Is your feature request related to a problem? Please describe.
cudf columns should support compound data types (e.g.: structs, lists).
Describe the solution you'd like
Using the same data layout as Arrow would be nice for compatibility. A struct would have child columns and a validity vector (so the struct itself can be null, since a struct of null fields is semantically different than a null struct). A list would contain the standard validity vector, a data vector containing the concatenated data across all rows, and an offset vector. The offset vector indicates the start location of each row's list of data. Therefore a row's data list starts at the indicated offset and ends at the offset of the next row.
The text was updated successfully, but these errors were encountered: