Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] String column copy performs a full gather #6803

Closed
jlowe opened this issue Nov 19, 2020 · 1 comment · Fixed by #6837
Closed

[BUG] String column copy performs a full gather #6803

jlowe opened this issue Nov 19, 2020 · 1 comment · Fixed by #6837
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python)

Comments

@jlowe
Copy link
Member

jlowe commented Nov 19, 2020

Describe the bug
While analyzing an Nsight trace of a Spark query, I noticed 10ms being spent on a filter. Digging deeper, the filter time was spent in a cudf::table copy constructor, with most of the time being spent copying one string column within the table.

Spark queries often filter input data to remove nulls before further processing, and often nothing is filtered. cudf::apply_boolean_mask, used to implement the Spark filter, has short-circuit logic to avoid a full gather and instead copy-constructs the output table if it realizes nothing will be filtered. However copy-constructing a string column performs a full gather which can be much more expensive than simply copying the input column buffers.

In this specific case, the string column consists of 178200 rows and each row is around 200 characters. The string copy constructor took almost 10 milliseconds on the GPU despite the column containing approximately 30MB of device memory.

Steps/Code to reproduce bug
Perform a copy of a string column where there are many strings that are relatively long (e.g.: 200+ characters per row). I've attached a gzipped Parquet file containing the string column from the specific filter case mentioned above for reference.

filtertest.parquet.gz

Expected behavior
A copy of a strings column view when the view starts at base offset 0 should be performed at device memory speed, copy the underlying buffers rather than performing the more complicated gather computation (which also requires synchronizing and a DtoH transfer).

@jlowe jlowe added bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS labels Nov 19, 2020
@davidwendt davidwendt added the strings strings issues (C++ and Python) label Nov 19, 2020
@jrhemstad
Copy link
Contributor

In addition, even when this operation does need to perform a gather it is inefficient as it materializes the gather map:

rmm::device_vector<size_type> indices(strings_count);
thrust::sequence(execpol->on(stream), indices.begin(), indices.end(), start, step);
// create a column_view as a wrapper of these indices
column_view indices_view(
data_type{type_id::INT32}, strings_count, indices.data().get(), nullptr, 0);
// build a new strings column from the indices
auto sliced_table = cudf::detail::gather(table_view{{strings.parent()}},
indices_view,
cudf::detail::out_of_bounds_policy::NULLIFY,
cudf::detail::negative_index_policy::NOT_ALLOWED,
mr,
stream)
->release();

This should just use a counting iterator passed to the detail version of gather that uses an iterator for the map.

harrism pushed a commit that referenced this issue Nov 26, 2020
…6837)

Fixes #6803

This optimizes string slice copying in the case where the string column view starts at offset 0. In that case the offset values do not need to be modified, and all the column buffers can be copied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants