[FEA] Nested types support #2857

jlowe · 2019-09-23T14:48:04Z

Is your feature request related to a problem? Please describe.
cudf columns should support compound data types (e.g.: structs, lists).

Describe the solution you'd like
Using the same data layout as Arrow would be nice for compatibility. A struct would have child columns and a validity vector (so the struct itself can be null, since a struct of null fields is semantically different than a null struct). A list would contain the standard validity vector, a data vector containing the concatenated data across all rows, and an offset vector. The offset vector indicates the start location of each row's list of data. Therefore a row's data list starts at the indicated offset and ends at the offset of the next row.

jrhemstad · 2019-09-23T14:53:31Z

I've changed the title since "compound" has a specific semantic meaning within libcudf++. Compound types refer to any type that has children, e.g., strings, dictionaries, nested, etc.

drabastomek · 2019-09-24T18:50:30Z

I cannot stress enough how I would love to see this...

revans2 · 2019-09-25T12:47:30Z

I would like to add that Spark has native support for maps. There has been some confusion in the Arrow documentation about maps, but generally they are represented as a List of Key, Value structs. List<Struct<Key, Value>> The main reason I add this is because parquet and orc both support map types and it would be good to have a "standard" representation that we can all agree on.

BartleyR · 2020-02-26T14:31:47Z

This would also be useful for us for a number of our use cases, including cyBERT post-processing where we have to remove overlapping columns between rows (created as an artifact of the training/inference phase).

ntadimeti · 2020-03-18T19:30:21Z

Would love to have this feature.

pinireisman · 2020-07-01T07:28:40Z

This will be invaluable for us as we use lists as elements in pandas dataframes alot, and would love to switch to cudf!

jrhemstad · 2021-03-15T18:35:40Z

Going to close this as libcudf now has both struct and list types. Support is not complete across all functions, but individual issues can be filed if specific functionality is missing.

jlowe added feature request New feature or request Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. Java Affects Java cuDF API. Spark Functionality that helps Spark RAPIDS labels Sep 23, 2019

jrhemstad changed the title ~~[FEA] compound types support~~ [FEA] Nested types support Sep 23, 2019

jrhemstad removed the Needs Triage Need team to review and classify label Sep 23, 2019

beckernick mentioned this issue Oct 4, 2019

[FEA] Support nested string types in columns (grandchildren) #2972

Closed

cwharris mentioned this issue Oct 17, 2019

[BUG] inconsistent behaviour of cudf.DataFrame and pandas.DataFrame from list of tuples #1705

Closed

harrism assigned trevorsm7 Dec 19, 2019

harrism added this to Needs prioritizing in Feature Planning via automation Dec 19, 2019

kkraus14 unassigned trevorsm7 Nov 18, 2020

jrhemstad closed this as completed Mar 15, 2021

Feature Planning automation moved this from Needs prioritizing to Closed Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Nested types support #2857

[FEA] Nested types support #2857

jlowe commented Sep 23, 2019

jrhemstad commented Sep 23, 2019 •

edited

Loading

drabastomek commented Sep 24, 2019 •

edited

Loading

revans2 commented Sep 25, 2019

BartleyR commented Feb 26, 2020

ntadimeti commented Mar 18, 2020

pinireisman commented Jul 1, 2020

jrhemstad commented Mar 15, 2021

[FEA] Nested types support #2857

[FEA] Nested types support #2857

Comments

jlowe commented Sep 23, 2019

jrhemstad commented Sep 23, 2019 • edited Loading

drabastomek commented Sep 24, 2019 • edited Loading

revans2 commented Sep 25, 2019

BartleyR commented Feb 26, 2020

ntadimeti commented Mar 18, 2020

pinireisman commented Jul 1, 2020

jrhemstad commented Mar 15, 2021

jrhemstad commented Sep 23, 2019 •

edited

Loading

drabastomek commented Sep 24, 2019 •

edited

Loading