Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Nested types support #2857

Closed
jlowe opened this issue Sep 23, 2019 · 7 comments
Closed

[FEA] Nested types support #2857

jlowe opened this issue Sep 23, 2019 · 7 comments
Labels
feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Spark Functionality that helps Spark RAPIDS

Comments

@jlowe
Copy link
Member

jlowe commented Sep 23, 2019

Is your feature request related to a problem? Please describe.
cudf columns should support compound data types (e.g.: structs, lists).

Describe the solution you'd like
Using the same data layout as Arrow would be nice for compatibility. A struct would have child columns and a validity vector (so the struct itself can be null, since a struct of null fields is semantically different than a null struct). A list would contain the standard validity vector, a data vector containing the concatenated data across all rows, and an offset vector. The offset vector indicates the start location of each row's list of data. Therefore a row's data list starts at the indicated offset and ends at the offset of the next row.

@jlowe jlowe added feature request New feature or request Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS labels Sep 23, 2019
@jrhemstad jrhemstad changed the title [FEA] compound types support [FEA] Nested types support Sep 23, 2019
@jrhemstad
Copy link
Contributor

jrhemstad commented Sep 23, 2019

I've changed the title since "compound" has a specific semantic meaning within libcudf++. Compound types refer to any type that has children, e.g., strings, dictionaries, nested, etc.

@jrhemstad jrhemstad removed the Needs Triage Need team to review and classify label Sep 23, 2019
@drabastomek
Copy link

drabastomek commented Sep 24, 2019

I cannot stress enough how I would love to see this...

@revans2
Copy link
Contributor

revans2 commented Sep 25, 2019

I would like to add that Spark has native support for maps. There has been some confusion in the Arrow documentation about maps, but generally they are represented as a List of Key, Value structs. List<Struct<Key, Value>> The main reason I add this is because parquet and orc both support map types and it would be good to have a "standard" representation that we can all agree on.

@BartleyR
Copy link
Member

This would also be useful for us for a number of our use cases, including cyBERT post-processing where we have to remove overlapping columns between rows (created as an artifact of the training/inference phase).

@ntadimeti
Copy link

Would love to have this feature.

@pinireisman
Copy link

This will be invaluable for us as we use lists as elements in pandas dataframes alot, and would love to switch to cudf!

@jrhemstad
Copy link
Contributor

Going to close this as libcudf now has both struct and list types. Support is not complete across all functions, but individual issues can be filed if specific functionality is missing.

Feature Planning automation moved this from Needs prioritizing to Closed Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Spark Functionality that helps Spark RAPIDS
Projects
No open projects
Development

No branches or pull requests

8 participants