Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Unify cudf::structs::detail::flatten_nested_columns and cudf::experimental::decompose_structs to improve performance for structs comparison #13032

Open
ttnghia opened this issue Mar 29, 2023 · 3 comments
Labels
0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue proposal Change current process or code

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Mar 29, 2023

For comparing structs column, both the legacy row comparators and the new experimental row comparators rely on struct flattening procedures. Each of them have their own flattening mechanism: cudf::structs::detail::flatten_nested_columns and cudf::experimental::decompose_structs. The difference between them are:

  • cudf::structs::detail::flatten_nested_columns replaces the input structs column with an optional column generated by materializing the input null mask.
  • cudf::experimental::decompose_structs doesn't materialize any new column. Instead, it replaces the input structs column with a modified version of it, which only has either zero or one child at the innermost level.

Although these APIs produce different output, these APIs do very similar job:

  • Both extract the input structs column into a table of children columns, which are much simpler than the input structs column to be compared on device code.
  • Both replace the input by a new column, and this new column is mainly used for checking nulls.

The issue of each from these approaches are:

  • cudf::structs::detail::flatten_nested_columns needs to materialize null mask of the input column into a real column.
  • cudf::experimental::decompose_structs still has a nested structs column in the output. Although that column only has zero or one child at the innermost level, it still causes performance degradation if its nested level is very high.

As such, we can unify the two approaches, taking the pros of both while eliminating the cons. The new flattening API should:

  • Avoid materializing new columns, and
  • Avoid output columns having more than one nested level.

This seems to be very straightforward with modifying the existing cudf::experimental::decompose_structs API.

@GregoryKimball
Copy link
Contributor

Hello @ttnghia, thinking about the priority of this suggestion, do we have a way to estimate the performance impact of high nesting levels for the cudf::experimental::decompose_structs implementation?

@GregoryKimball GregoryKimball added 0 - Backlog In queue waiting for assignment proposal Change current process or code libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue and removed feature request New feature or request Needs Triage Need team to review and classify labels Jun 7, 2023
@ttnghia
Copy link
Contributor Author

ttnghia commented Jun 7, 2023

do we have a way to estimate the performance impact of high nesting levels for the cudf::experimental::decompose_structs implementation?

We can run a benchmark comparing sortings of two non-nullable tables:

  1. A table with highly nested struct+list
  2. A table resulted from calling cudf::structs::detail::flatten_nested_columns on the table above.

The second table is flattened from the first table so it will have all columns having at max 1 nested level.

@GregoryKimball
Copy link
Contributor

@divyegala and I discussed this idea today, and we would like to also monitor the impact to peak memory usage when experimenting with this idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue proposal Change current process or code
Projects
None yet
Development

No branches or pull requests

2 participants