-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize child column construction in scatter() for lists columns #6791
Conversation
Supports lists of: 1. fixed width data types 2. strings 3. lists (of all of the above, and lists(of ...))
1. Additional type checking for child columns 2. Moved list_device_view functions inline
WIP: Attempting to replace O(N**2) with a single thrust::for_each_n(). Borked, because of empty lists. Will need closer look.
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
Just change your scatter to atomic increment rather than setting the value to 1 for each offset.
BTW, can you please file an issue for this for tracking (rather than just a PR)? |
Hmmm, I realized that there's no way to atomically increment the scattered output with Thrust... This requires some thought. |
Ah. You could use a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be easy to support empty lists.
// Helper to generate mapping between each child row and which list it belongs to. | ||
rmm::device_vector<cudf::size_type> get_child_row_to_list_map(cudf::size_type num_child_rows, | ||
column_view const& list_offsets, | ||
rmm::cuda_stream_view stream) | ||
{ | ||
CUDF_EXPECTS(list_offsets.size() >= 2, "Invalid list offsets."); | ||
|
||
auto scatter_map = cudf::slice(list_offsets, {1, list_offsets.size()-1})[0]; | ||
auto d_scatter_map = scatter_map.data<cudf::size_type>(); | ||
auto ret = rmm::device_vector<cudf::size_type>(static_cast<std::size_t>(num_child_rows), 0); | ||
auto scatter_1 = thrust::make_constant_iterator<cudf::size_type>(1); | ||
|
||
thrust::scatter( | ||
rmm::exec_policy(stream)->on(stream.value()), | ||
scatter_1, | ||
scatter_1 + scatter_map.size(), | ||
d_scatter_map, | ||
ret.begin() | ||
); | ||
|
||
thrust::inclusive_scan( | ||
rmm::exec_policy(stream)->on(stream.value()), | ||
ret.begin(), | ||
ret.end(), | ||
ret.begin() | ||
); | ||
|
||
return ret; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support empty lists...
// Helper to generate mapping between each child row and which list it belongs to. | |
rmm::device_vector<cudf::size_type> get_child_row_to_list_map(cudf::size_type num_child_rows, | |
column_view const& list_offsets, | |
rmm::cuda_stream_view stream) | |
{ | |
CUDF_EXPECTS(list_offsets.size() >= 2, "Invalid list offsets."); | |
auto scatter_map = cudf::slice(list_offsets, {1, list_offsets.size()-1})[0]; | |
auto d_scatter_map = scatter_map.data<cudf::size_type>(); | |
auto ret = rmm::device_vector<cudf::size_type>(static_cast<std::size_t>(num_child_rows), 0); | |
auto scatter_1 = thrust::make_constant_iterator<cudf::size_type>(1); | |
thrust::scatter( | |
rmm::exec_policy(stream)->on(stream.value()), | |
scatter_1, | |
scatter_1 + scatter_map.size(), | |
d_scatter_map, | |
ret.begin() | |
); | |
thrust::inclusive_scan( | |
rmm::exec_policy(stream)->on(stream.value()), | |
ret.begin(), | |
ret.end(), | |
ret.begin() | |
); | |
return ret; | |
} | |
// Helper to generate mapping between each child row and which list it belongs to. | |
rmm::device_vector<cudf::size_type> get_child_row_to_list_map(cudf::size_type num_rows, | |
cudf::size_type num_child_rows, | |
column_view const& list_offsets, | |
rmm::cuda_stream_view stream) | |
{ | |
CUDF_EXPECTS(list_offsets.size() >= 2, "Invalid list offsets."); | |
auto d_scatter_map = list_offsets.data<cudf::size_type>(); | |
rmm::device_uvector<cudf::size_type> head_keys{num_rows}; | |
rmm::device_uvector<cudf::size_type> head_flags{num_rows}; | |
auto new_end = thrust::reduce_by_key( | |
rmm::exec_policy(stream)->on(stream.value()), | |
list_offsets.begin<cudf::size_type>(); | |
list_offsets.end<cudf::size_type>(); | |
thrust::make_constant_iterator<cudf::size_type>(1), | |
head_keys.begin(); | |
head_flags.begin() | |
); | |
auto ret = rmm::device_vector<cudf::size_type>(static_cast<std::size_t>(num_child_rows), 0); | |
thrust::scatter( | |
rmm::exec_policy(stream)->on(stream.value()), | |
head_flags.begin(), | |
new_end.second, | |
head_keys.begin(), | |
ret.begin() | |
); | |
thrust::inclusive_scan( | |
rmm::exec_policy(stream)->on(stream.value()), | |
ret.begin(), | |
ret.end(), | |
ret.begin() | |
); | |
return ret; | |
} |
A benchmark should be added as part of this PR, to demonstrate the value before and after, and catch future regressions. There is an existing scatter benchmark (under copying), just need to add benchmark cases for the different nested types. |
I finally got my head around what needs doing here, based on the following example from @harrism:
I've been slicing the offsets column before the scatter, and that should remain unchanged. The slice needs to happen after the |
This PR has been marked stale due to no recent activity in the past 30d. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be marked rotten if there is no activity in the next 60d. |
@mythrocks do you plan to resume this? |
I'd like to resume this work at some point. We were able to put the idea to use in #7189. This PR has gone stale. I'll raise a new PR when I do resume. |
This is a followup to #6768 (which adds
scatter()
support for list columns). @harrism advises that the child-column construction could be a lot faster:My initial attempt at this works for columns without empty lists. E.g.
For the above, we should have ideally produced
[0, 0, 0, 1, 1, 3, 3, 3, 3]
to correctly map the children back to the lists. This needs figuring out to support empty lists.